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Foreword 



Introduction 

In computer science, grammars are human-constructed formalisms that are 
meant to define languages, in particular programming languages or (in theo- 
retical computer science) formal languages. This description is often partial, 
it is not unusual to see a formal description of the syntactic structure of a 
language, while the semantic part remains ill-defined. Finding the syntactic 
structure of a program (a sentence in the language) is part of the compilation 
process of a program. The construction of this structure is called parsing and 
the result of the parsing process is a hierarchical account of the elements that 
make up the program. This account makes it possible to assign semantics to 
a program. 

Formal syntactic descriptions of languages were first given by the linguist 
Noam Chomsky. Because the descriptions were formal, the languages were 
also formal: sequences of symbols that satisfied descriptions based on finite 
state automata or regular grammars, context-free grammars, or context sen- 
sitive grammars. In Chomsky’s view, these descriptions were the starting 
point for descriptions of the syntactic part of natural, human spoken, lan- 
guages. Moreover, these descriptions would allow the assignment of meaning 
to sentences. Interestingly, at about the same time Chomsky introduced dif- 
ferent classes of grammars and languages, a committee defining a program- 
ming language (ALGOL) introduced a programming language description 
called Backus-Naur-Form (BNF) which turned out to be equivalent to one 
of Chomsky’s grammar classes, the so-called context-free grammars. We are 
talking about the period 1959-1961. 

In Chomsky’s view, human language grammars were not human-con- 
structed formalisms. The rules of the formalism, or, more generally, the prin- 
ciples that determine the rules, are supposed to be innate. This view led to 
a distinction between competence and performance in human language use. 
Each language user has a language competence that allows him to construct 
all kinds of sentences using the rules of a grammar. Constructing sentences 
can be compared with using rules to compute a multiplication or a division 
in arithmetics. Language users can construct sentences using rules of syntax. 
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Due to environmental circumstances in normal man-to-man communication, 
these rules are not always obeyed. Performance differs from competence. 

It is much easier, however, to formalize self-chosen rules of sentence con- 
struction and analysis than to formalize actual language behaviour. For this 
obvious reason, grammar formalisms and their parsing methods have drawn 
so much attention by computational linguists. But it should be admitted, on 
the other hand, that nowadays powerful language processing systems can be 
built that are based on these formalisms. Whether or not the used formalisms 
meet certain linguistic principles in some way or other, or even some princi- 
ples of human language innateness, is not the main concern of those doing 
research and development in this area. 

Parsing Methods 

Parsing methods have been defined for all kinds of language descriptions. 
Formal descriptions define formal languages. After the introduction of the 
well-known Chomsky hierarchy in the late 1950s and early 1960s we see a 
common interest of computer scientists and computational linguists in pars- 
ing methods for context-free languages. The quest for efficient parsing meth- 
ods led to polynomial-time algorithms for general context-free grammars in 
the middle and late 1960s. In computer science, however, these formalisms 
were thought to be unnecessarily general for describing the syntactic proper- 
ties of programming languages, and therefore to be unnecessarily inefficient. 
LL and LR grammars and parsing methods were introduced. These required 
only linear time and were sufficiently general for dealing with the syntactic 
backbone of programming languages. Interest in general context-free methods 
diminished, or was left to theoretical computer scientists. In computational 
linguistics there were other reasons to become critical of the context-free 
grammar formalism. Its descriptional adequacy, that is, its ability to cover 
linguistic generalities in a natural way, was considered to be too weak. It was 
also doubted whether it provides sufficient generative capacity. The LL and 
LR approaches favoured in computer science were clearly much less suitable, 
because these do not allow representation of syntactic ambiguities. 

It is remarkable that in the late 1970s and early 1980s we see a growing 
interest of LR-like methods and context-free grammars in computational lin- 
guistics and a growing interest in general context-free grammar descriptions 
in computer science. How can this be explained? 

In computational linguistics, first of all, the so-called ‘determinism hy- 
pothesis’ attracted a lot of attention. The idea is that in general people do 
not ‘backtrack’ while analysing a sentence. Backtracking becomes necessary 
only when a started analysis cannot be continued at some point in the sen- 
tence. Mitch Marcus introduced an LR-like ‘wait and see’ stack formalism 
in order to parse sentences ‘deterministically’. Reviewing the literature from 
that period, one sees lots of misconceptions and confusion among researchers. 
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Apparently these are partly due to lack of knowledge about formal parsing 
methods such as, for example, Earley’s method and how issues like ‘back- 
tracking’, ‘determinism’, and ‘efficiency’ relate to these algorithms. Since 
then, however, maybe due to changes in university curricula and the rise 
of a new generation of computational linguists, knowledge of formal methods 
has become more widespread. This can also be illustrated with the intro- 
duction of formalisms like Lexical Functional Grammar (LFG), Generalized 
Phrase Structure Grammar (GPSG), Head-Driven Phrase Structure Gram- 
mar (HPSG), Unification Formalisms, Definite Clause Grammars and Tree 
Adjoining Grammars (TAGs) in the early 1980s. It led to a new discussion 
on the question whether the generative capacity of context-free formalisms 
would suffice to describe the syntax of natural languages and it led to a sys- 
tematic comparison of grammar formalisms, yielding weakly context-sensitive 
languages as a recently discovered class for which adequate generative capac- 
ity is claimed. 

The formalisms mentioned above are certainly much more general than 
a pure context-free formalism. However, their backbone is context-free or 
the way the formalisms are defined and used bear very much resemblance 
to the context-free paradigm. These approaches should be contrasted with 
more traditional approaches in computational linguistics using so-called aug- 
mented transition networks or chart parsing methods. It would be interesting 
to investigate how the availability of the Prolog implementation mechanism 
has influenced the direction of research in computational linguistics. 



Shift of Attention 

We have not yet mentioned one of the main influences that caused researchers 
in computational linguistics and natural language processing to shift their at- 
tention to existing formal parsing methods and possible extensions of these 
methods. That influence was the increasing demand of society, military and 
funding organisations to produce research results that could be used to build 
tools and systems for practical natural language processing applications. Ap- 
plications like speech understanding systems, natural language interfaces to 
information systems, machine translation of texts, information retrieval, help 
systems for complex software and machinery, knowledge extraction from doc- 
uments, text image processing, and so on. The availability of a comprehensive 
cognitive and linguistic theory does not seem to be a precondition for appli- 
cations in the area of natural language understanding. Many applications do 
not require this comprehensive theory. Moreover, many applications can be 
built using research results that are not influenced in any way by cognitive, 
psycho-linguistic or linguistic principles. Generalized LR parsing, introduced 
in the mid-1980s by Masaru Tomita, is such a method that was introduced as 
a simple, straightforward and efficient parsing method for general context-free 
languages and grammars. Due to its straightforwardness, like the determin- 
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istic LR method, it has attracted a lot of attention and it has been used in 
many natural language applications. 

In computer science, it was stated above, more and more attention has 
been devoted to general context-free parsing methods. The just mentioned 
generalized LR method, for example, has been used in grammar, parser and 
compiler development environments. In general, software engineering envi- 
ronments may offer their users syntax-dependent tools. The compiler con- 
struction level is only one level, and a rather low one, where descriptions 
based on formal grammar play a role. Furthermore, computer science is a 
growing science in the sense that borders between so-called ‘pure’ computer 
science and several application areas are disappearing. Grammars and parsing 
methods play a role in pattern recognition, they have been used to describe 
and analyse command and action languages (human interaction with a com- 
puter system through key presses, cursor movements, etc.), to describe screen 
layouts, etc. Human factors have become important in computer science. In- 
creasing the accessibility of computer systems through the use of speech and 
natural language in the man-machine interface is an aim worth pursuing. It is 
obvious that computer scientists and computational linguists will meet each 
other here and that they can learn from each other’s methods to deal with 
languages. 

Finally, we would like to mention another influence that caused computer 
scientists to go back to general context-free parsing methods. Parallelism is 
the keyword here. The introduction of new types of machine architectures 
and the possibility to implement algorithms on single chips have led to new 
research on existing parsing algorithms. Some research has been purely the- 
oretical, in the sense that all kinds of efficiency limits were explored. Some 
research has been practical, in the sense that all kinds of extensions and vari- 
ations of existing parsing algorithms have been investigated in order to make 
them suitable for parallel implementations and that these implementations 
have been realized, analysed and evaluated. 

About this Book 

Now that we have presented views on the development of parsing theory, it 
is useful and necessary to review the contribution of the research reported on 
in this book to the future development of parsing theory. It is an important 
contribution. Properties of existing parsing algorithms are unraveled in detail 
in the first two parts, ‘Exposition’ and ‘Foundation’. The results are used to 
shed new light on these algorithms and to compose new algorithms. Parsing 
schemata, refinement and filtering are introduced as tools and used by the 
author, as a surgeon using his blade, to tackle the subject. The insight ob- 
tained in the first two parts is used in the third part, ‘Application’, to study 
the parsing problem for unification grammars (while at the same time an 
exceptionally clear exposition of unification methods is given) , to study and 
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IX 



introduce new versions of left-corner, head-corner and generalized LR pars- 
ing algorithms, and to clarify Rytter’s efficient parallel parsing algorithm for 
the design of Boolean circuit parsers. Throughout the book the approach 
taken allows observations on parallelization and parallel implementations of 
the discussed algorithms. 

Computer scientists, computational linguists and language engineers can 
profit from the knowledge that is to be found in this unique book. As such 
it can be considered as a milestone of the 1990s in the field of parsing theory 
and its applications. 



Anton Nijholt 

Professor of Computer Science 
University of Twente, 
Enschede, The Netherlands 
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1. Introduction 



Syntax describes the structure of language. A well-formed sentence can be 
broken down into constituents, according to syntactic rules. A constituent 
that covers more than a single word can be broken down into smaller con- 
stituents, and so on. In this way one can obtain a complete, hierarchical de- 
scription of the syntactic structure of a sentence. A computer program that 
attributes such structures to sentences is called a parser. This book is about 
parsers, and our^ particular concern is how such parsers can be described in 
an abstract, schematic way. 

This is not the first book about parsing (nor will it be the last). In 1.3 we 
discuss our specific contribution to the theory of parsing. But before zooming 
in on the research questions that will be addressed, it is appropriate to make 
a few general remarks. 

In the analysis of language we make a distinction between form and mean- 
ing. The relation between form and meaning is an interesting and not entirely 
unproblematic one. There are sentences that are grammatically correct, but 
do not convey any sensible meaning, and ill-formed sentences with a per- 
fectly clear meaning. But we are not concerned with meaning of language 
and restrict our attention to form. 

The form of a language is described by the grammar. Grammatical anal- 
ysis can be further divided into morphology., describing the word forms, and 
syntax, describing sentence structures. In computer science, the words “gram- 
mar” and “syntax” have the same meaning, because computer programming 
languages do not have any morphology. In linguistics, despite the fact that 
these words are not equivalent, “syntax” is also known as “phrase structure 
grammar” which - sloppily but conveniently - is often abbreviated to “gram- 
mar”. But we are not concerned with morphology either, and throughout 



^ It is common practice in scientific texts written by a single author to use the 
plural first person forms “we” and “our” to mean “I” and “my”. This is to be 
understood as pluralis modestiae\ scientific research is a group activity and not 
one’s individual merit. I will conform to this custom and mostly use the plural 
form. I will use the singular form for personal comments and also for particularly 
strong claims, where I want it to be clear that none of my tutors and colleagues 
share any part of the blame, might I be proven wrong. 
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this book the word “grammar” refers to phrase structure grammar, unless 
explicitly stated otherwise. 

We will not discuss the grammar of any existing language in any detail. 
So, with some overstatement, one could say that this book is not about 
language at all. The objects of study are formalisms that are used to describe 
the syntax of languages, and the parsing of arbitrary grammars that can be 
described in such formalisms. This is a useful scientific abstraction. Rather 
than making a parser for a particular grammar for a particular language, 
one constructs a parser that works on a suitable class of grammars. For any 
grammar within that class, a program can be instantiated that is a parser 
for that particular grammar. 

Natural language is informal. Language is living^ continuously evolving. 
The most rapidly changing part of a language is the lexicon. New words are 
added all the time and old words obtain new meanings and connotations; 
no lexicon is ever complete. The most elusive aspect of language is meaning. 
We live in an informal world, and any formal theory of meaning is at best 
an approximation of “real” meaning. The grammar (comprising syntax and 
morphology) is a rather more stable part of language. Grammars do change 
over time, but these changes are slow and few. If we are to construct computer 
systems that handle natural language in some way or other, it is a small and 
acceptable simplification to say that the grammar of a language is fixed. 

Despite the fundamental differences between natural languages and pro- 
gramming languages, there is some overlap in parsing theory of both fields. 
Grammar formalisms that are used in both fields share the notion of context- 
free grammars (CFGs). For a complete description of the structure of a lan- 
guage, CFGs have too many limitations and one needs more powerful for- 
malisms. But for the purpose of constructing parsing techniques, it makes 
sense to break up the complex task of parsing into different levels. Hence 
is it useful to distinguish between a context-free backbone that describes the 
“core” of the grammar and augmentations to the context-free formalism that 
describe additional characteristics of the language.^ We are primarily inter- 
ested in parsing of natural languages, but many issues have some relevance 
for programming language parsing as well. Only in Chapters 7, 8, and 9 we 
concentrate on unification grammars, a modern formalism (or, to be pre- 
cise, a group of formalisms) that is specifically designed to describe natural 
language grammars. 

The fact that this book is in the interface between computer science and 
computational linguistics has advantages and disadvantages. On the positive 

^ This does not necessarily mean that a parser should first construct a context-free 
parse and afterwards augment this with other features. A parser that integrates 
these aspects into a single process can still be thought of as consisting of differ- 
ent (but interacting) modules for context-free phrase structure analysis and for 
evaluation of other features. 
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side, the purpose and contents of the discussed subjects must be explained to 
a heterogeneous audience. This means that one cannot - as many theoretical 
computer scientists, by virtue of their specialty, are inclined to do - engage in 
increasingly technical and formal reasoning and along the way forget about 
the motivation behind the theory that is being developed. One has to make 
clear what is being done and why it makes sense to do it that way; not all 
the readers will be familiar with the culture of one’s own sub-field in which 
such considerations are part of common knowledge and, if the subject is 
well-established, might never be questioned. A disadvantage, perhaps, is the 
increased length of the text. Many subjects could be discussed rather more 
concisely for a small group of fellow specialists. But this has a positive side as 
well: at least some chapters and sections should be easy reading. The more 
mathematically inclined reader with some knowledge of the field may skip 
large pieces of introductory text and trivial examples and move straight to 
definitions, theorems, and proofs. The less mathematically inclined reader, 
on the other hand, may skip much of the technical stuff if he^ is prepared 
to take for granted that the claimed results can be formally established. To 
the reader who might be put off by the size of this volume it is perhaps a 
comforting thought that many parts can be read independently and hardly 
anybody is expected to read everything. 

In Section 1.1 we will devote a few words to the history of syntax as a 
field of study. Phrase structure grammars and parsing are introduced in 1.2, 
the general idea of parsing schemata is presented in 1.3. Section 1.4, finally, 
gives an overview of the following chapters. 



1.1 The structure of language 



Modern linguistics starts in the 1950s with the work of Noam Chomsky. He 
was the first (in the Western world^) to develop a formal theory of syntax. 
Native speakers of a language have an intuitive understanding of the syntax. 
One is able to understand a sentence as syntactically correct, even though it 
does not convey a sensible meaning. An example, given by Chomsky, is the 
sentence “Colorless green ideas sleep furiously.” Even though it is nonsense, 

^ In contexts where the gender of a third person is of no importance, I will some- 
times write “he” and sometimes “she” . 

^ A formal grammar of Sanskrit (as it was spoken 1000 years earlier but preserved 
in ritual Vedic texts) was produced between 350 B.C. and 250 B.C. by the Indian 
scholar Panini. This was unknown to the European school of general linguistics 
(with its roots in the Greek and Roman tradition) until the 19th century. Panini 
used rewrite rules, both context-free and context-sensitive. His grammar was 
more concerned with morphology than syntax, as word order in Sanskrit is rather 
free (Cf. [Staal, 1969], [Bharati et al., 1995]). 
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the syntax is correct, in contrast to a string of words “Furiously sleep ideas 
green colorless”. This shows that syntax is autonomous^ one does not need 
to know the meaning of a sentence in order to decide whether the sentence 
is well-formed or ill-formed. People with the same native language, despite 
great differences in learning and linguistic felicity, share this intuition of what 
is syntactically correct. It is this human faculty of syntax that Chomsky set 
out to investigate. 

A syntactic theory, like any scientific theory, is inductive. A theory can 
never be derived from a given set of facts, however large. The design of a 
theory is speculative. But when a theory has been postulated, one can inves- 
tigate how well it matches the facts. A good theory of syntax will describe 
as well-formed those sentences that are recognized as evidently well-formed 
by native speakers and describe as ill-formed those sentences that are evi- 
dently ill-formed. In between these, there is a group of sentences of which the 
correctness is doubted, even by grammarians. One should not worry about 
these fringe cases and let the theory decide. Chomsky set forth to develop 
such a theory by introducing a grammar formalism and describing the syntax 
of English by means of that formalism. In order to obtain a universal theory 
of syntax, it should be possible to describe the syntax of all human languages 
in similar fashion. 

But before we discuss any detail of modern linguistics and computational 
linguistics, let us consider the question why everything Chomsky did was so 
new. What was wrong with pre-Chomskian linguistics, and why do we know 
so little about it? 

Science makes abstractions of the world. A coherent set of abstractions 
is called a paradigm. Science (or at least good science) is objective within a 
paradigm, but the question whether a given set of abstractions is better than 
a set of different abstractions cannot be answered scientifically. Thomas Kuhn 
[1970] has shown that scientific knowledge is not necessarily accumulative. In 
a scientific revolution an old, established paradigm is rejected in favour of a 
new one; our understanding of the world is reconstructed in terms of the new 
paradigm. Chomsky initiated such a paradigm shift. 

Many aspects of syntactic theory as we see it now were in fact known in 
pre-Chomskian times. But they were seen in a different light. Linguistic re- 
search concentrated on other issues. Linguistics described the languages that 
occur in the world, and their development. An important sub-field was that 
of comparative linguistics: how are languages related to one another, and how 
do languages develop over time? A major achievement is the reconstruction 
of the development of Indo-European languages from a common ancestor. 

Comparative linguistics must be based on facts, and these facts are pro- 
vided by descriptive linguistics. The description of existing languages did 
include the syntax. Syntax was a collection of constructs that could be used 
to form sentences. But the interesting point about syntax was in which way it 
differs from and corresponds to the syntax of other languages. When Panini’s 
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grammar of 3000 year old Sanskrit became known to Western scholars, this 
gave a great impulse to comparative linguistics, not so much to general the- 
ories of language. 

Wilhelm von Humboldt [1836] was the first in Europe to note that only 
a finite number of rules is needed to construct a language with an infinite 
variety of sentences. But theories of language in the Western tradition had 
since antiquity been troubled by a mixture of facts and philosophical precon- 
ceptions. They discussed “the place of language in the universe” [Bloomfield, 
1927] rather than the structure of language. It took another century to dis- 
entangle these issues, get rid of all metaphysical speculation and simply take 
the facts for the facts. Leonard Bloomfield [1933] is generally seen as the 
person who established general linguistics as a science. 

In this view, distinguishing “correct” sentences and forms from “incor- 
rect” ones was a non-issue. Or even worse, it wets a hobby of schoolmasters 
and people of some learning but with no clue about contemporary linguistics. 
Linguistics as a science is descriptive^ not prescriptive. 

Many elements of modern syntactic theory are given already by Bloom- 
field, but (as we have pointed out abundantly) from a different perspective. 
Constituents could be decomposed into smaller constituents, hence, as we see 
it now, syntax trees are implicitly defined as well. There was a distinction 
between recursive {endocentric) and non-recursive (exocentric) constituent 
formation. It was stipulated that every language has only a small number of 
exocentric constructs. 

It was Chomsky [1957] who put the notion of competence grammar on 
the linguistic agenda, and started to develop a formal theory of syntax. He 
introduces transformational grammar (TG) and compares it with two other 
formalisms that could serve as a basis for such a linguistic theory. These 
two formalisms are nowadays (but not then) known as finite state automata 
and context-free grammars. The first is shown to be insufficient (because it 
cannot handle arbitrary levels of recursion). The second is also rejected. A 
transformational grammar is much smaller and more elegant than a context- 
free grammar for the same language. Moreover, a transformational grammar 
provides more insight as it shows the relation between different, but related 
sentences. A small set of kernel sentences is produced by a a set of rewrite 
rules (that constitute a context-free grammar). All other sentences can be 
produced from these kernel sentences by applying transformations. In this 
way a much smaller number of rules is needed than in a context-free grammar 
of English - if one exists.^ 



^ Whether English can be described by a context-free grammar was posed as an 
open question by Chomsky [1957]. The issue has attracted a lot of discussion. 
Pullum and Gazdar [1982], in a review of the debate, inspected all the arguments 
opposing context-freeness and refuted all of these as either empirically or formally 
incorrect. Huybregts [1984], Shieber [1985b], and Manaster-Ramer [1987] have 
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Chomskian linguistics has developed considerably over the la^t three 
decades. The initial notion of a kernel set of sentences has been replaced 
by the notion of a deep structure, that is produced by the rewrite rules of 
the grammar. Sentences occurring in the language have a surface structure 
that is obtained from the deep structure by means of transformations. A 
much more elaborate version of TG, also including semantics, is known as 
the standard theory [Chomsky, 1965]. Continuing research led to an extended 
standard theory in the 1970s. But transformational grammar was eventually 
abandonded in favour of Government and Binding (GB) theory [Chomsky, 
1981]. 

A context-free phrase structure grammar of a language has many more 
rules than a transformational grammar, but from the perspective of computa- 
tional linguistics, context-free grammars are much simpler. Parsing a sentence 
according to a transformational grammar is, in general, not computationally 
tractable (and the same holds for GB), whereas parsing of context-free gram- 
mars can be done efficiently. General-purpose context-free grammiaxs have 
been constructed that have an adequate coverage of English phrase structure 
(see, e.g., Sager [1981]). 

As has been stated in the introduction of this chapter, there are various 
ways in which other grammatical information (as subject-verb agreement) 
and semantic information can be added to a context-free phrase structure. 
The trend in computational linguistics is towards so-called unification gram- 
mars, in which this distinction is blurred. Nevertheless, for the purpose of 
constructing efficient parsers, it is useful to keep making a distinction be- 
tween phrase structure and other syntactic and semantic features. The first 
six chapters deal exclusively with (context-free) phrase structure and we post- 
pone an introduction of unification grammars to Chapter 7. 

The development of “high-level”, third generation programming lan- 
guages started in the 1950s as well. Before such languages were available, 
one had to instruct computers in languages that are much more closely re- 
lated to the hardware capabilities of such a machine. Move a number from 
this location to that location; if the contents of a specific memory location 
is zero, then jump to some other position in the computer program; and so 
on. High-level languages offered the possibility of “automatic programming” . 
Rather than writing machine instructions (at the level of second generation 
languages), one could concentrate on what a program is supposed to do. Such 
a program could be translated into “real” computer language by means of 
another program, called a compiler. 

In the definition of the programming language Algol 60 the structure 
of the grammar was described by a formalism that later became known as 

established beyond doubt that Swiss-German and Dutch are not context-free 
languages. 




1.2 Paxsing 



9 



Backus-Naur Form (BNF). It was only after the publication of the Algol 
definition [Naur, 1960] that computer scientists realized similarities in BNF 
and phrase structure grammars that were studied by linguists. Ginsburg and 
Rice [1962] proved that BNF is equivalent to context-free grammars. This 
insight sparked off a of body of research in formal languages, which is now 
part of the foundations of computer science as well as formal linguistics. Hence 
is it not a coincidence that, despite the radical differences in structure and 
complexity, there is considerable overlap in the underlying theory of syntax 
of natural languages and programming languages. 



1.2 Parsing 

We will define the parsing problem for context-free (backbones of) grammars 
and discuss briefly why this is still a relevant area for research. We do not 
dwell upon the historical development of various parsing techniques. This 
cannot be properly done in a few paragraphs without getting involved in 
some technical detail. The interested reader is referred to Nijholt [1988] for 
a good and easy to read overview. 

A parse tree is a complete, hierarchical description of the phrase structure 
of a sentence. The parsing problem^ for a given grammar and sentence, is to 
deliver all parse trees that the grammar allows for that sentence. Stated in this 
very general way, the parsing problem is actually underspecified: we do not 
prescribe a formalism in which these parse trees are to be denoted. There are 
techniques to specify such a forest of trees in a compact way, without listing 
all the trees individually (cf. Chapter 12). The savings can be considerable. 
Because we look at syntactic structure only and do not rule out parse trees 
that yield an absurd interpretation, most sentences have a lot of different 
parse trees. 

Related to the parsing problem is the recognition problem. For a given 
grammar and sentence it is to be determined whether the sentence is well- 
formed (i.e., at least one parse tree exists). This is a fully specified problem. 
There are only two possible answers and how these are denoted (“true” or 
“false”, or “1” or “0”) is not relevant. 

An algorithm is a prescription how to solve some problem in a system- 
atic way. Algorithms can be encoded in programming languages, so that a 
computer can solve the problem. A parsing algorithm, or parser^ for short, 
is an algorithm that solves the parsing problem. A recognizing algorithm, or 
recognizer for short, is an algorithm that solves the recognition problem. 

There is an intermediate form between parsers and recognizers. Such algo- 
rithms provide an answer to the question whether the sentence is well-formed 

® Usually a parser is understood to be a computer program, rather than an algo- 
rithm encoded in the program, but this distinction is irrelevant here. 
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and, additionally, deliver a structured set of intermediate results that have 
been computed in order to obtain the answer. These intermediate results en- 
code various details about the sentence structure and are of great help to 
actually construct parse trees. Such algorithms could be called “enhanced 
recognizers”, but in the literature these are usually called parsers as well, 
despite the fact that no parse trees are produced. With the exception of 
Chapters 2 and 3 we will mostly be concerned with parsers in this improper 
sense. 

In different sub-fields there are some variants of the parsing problem. In 
the field of stochastic grammars, the task is to find the most likely parse 
tree according to some probability distribution. In parsing of programming 
languages one is interested in a single, uniquely determined parse tree. In 
case of ambiguities there must be additional criteria that specify which is the 
right parse tree - otherwise a program may have an ambiguous interpretation, 
which is highly undesirable. 

Programming language grammars are much simpler than natural lan- 
guages, but the sentences (programs) are much longer. Hence the specialized 
techniques to construct efficient parsers are different, but there is some cross- 
over. Mostly this is the adaptation of computer science parsing techniques to 
parsing of natural languages. Occasionally, however, it also happens that the 
compiler construction community adapts techniques that were developed in 
computational linguistics. 

The theory of parsing is some 30 years old now, and one may wonder 
whether there is anything of general interest that has not yet been uncovered 
in this field. There are always enough open questions (and more answers 
lead to even more open questions) and a field is never finished. But as the 
body of knowledge grows, the frontier of research is pushed to more and 
more specialized issues in remote corners of knowledge that perhaps nobody 
except a small bunch of fellow scientists is even aware of. There are two 
reasons, however, that make parsing theory an interesting field up to this 
day. 

Firstly, there is the issue of parallel parsing. A variety of parallel parsing 
algorithms has been proposed in the last decade, in particular in the field 
of Computational Linguistics [Hahn and Adriaens, 1994]. There are great 
differences, not only in the type of parsing algorithm employed, but also in 
the kind of parallel hardware (or abstract machine) that such algorithms 
should run on. In order to compare the relative merits of different parallel 
parsing algorithms, one should start to describe these in a uniform way. Pars- 
ing schemata have originally been conceived as a framework for comparing 
different parallel parsers on a theoretical level. In order to find such a com- 
mon description, one has to abstract from a great many details. As it turned 
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out, the framework is also useful for a high-level description of traditional, 
sequential parsing algorithms; it is stated nowhere that an implementation 
of a parsing schema must be parallel. 

Secondly, the formalisms in which natural language grammars are de- 
scribed have changed over the last decade. This has some consequences for 
parsing natural language grammars. Logic has gained an important role in 
the interface between grammarians and computers. On the one hand, there 
are programming languages as Prolog or, more recently. Constraint Logic 
Programming (CLP), [Jaffar and Laissez, 1987], [Cohen, 1990], that allow pro- 
grams to be written as a set of logic formulae. On the other hand, grammars 
can also be written as a set of logic formulae. A parse, then, corresponds 
to a proof. The sentence is postulated as a hypothesis, and the sentence is 
correct (and a parse is produced) if a formula can be proven that can be 
interpreted as “this is a sentence (and its structure is so-and-so)”. Such a 
proof can be carried out by a PROLOG or CLP interpreter, i.e., a computer 
program. So we have another level of “automatic programming”, where one 
only needs to specify the grammar and there is no more need to construct 
a parser. There is a catch, however. Such specifications in logic can (under 
certain restrictions) be interpreted directly by machines, but that does not 
necessarily mean that a machine will do so in an efficient manner. From a 
computational point of view it is more appropriate to see such a grammar 
as an executable specification^ not as the most suitable implementation of a 
parser. Computer science, therefore, can make valuable contributions to the 
construction of efficient parsers for these grammar formalisms. 

A nice example of this last point is the following. The context-free back- 
bone is no longer particularly relevant for the specification of a grammar. 
Hence, as things go in evolution, context-free backbones tend to dwindle away. 
A modern grammar specification with “degenerated” context-free backbone, 
typically has a much larger context-free backbone hidden inside the gram- 
mar. It has recently been shown by Nagata [1992] and Maxwell and Kaplan 
[1993] that retrieving and using a more elaborate context-free backbone can 
substantially increase the efficiency of a parser. 



1.3 Parsing schemata 

There are many different ways to design a parser. One can build trees branch 
by branch, adding grammar productions one at the time. Or one can collect 
various bits of tree and combine small trees to larger trees in various ways. 
The important thing is that it is a constructive process. Parsing schemata 
can be use to describe any parser that works in a constructive way. 

There are non-constructive parsers as well. An entirely new brand of com- 
putation is embodied in neural networks. We will briefly discuss these in 
Chapter 14. But almost all parsers that run on von Neumann machines (i.e. 
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computers as we know them) are constructive/ A constructive parser com- 
putes a series of intermediate results and these (or, to be precise, most of 
these) are used for the computation of next, more advanced intermediate 
results, until the final result is established. 

A parsing schema focuses on these intermediate results, called items in 
parsing terminology. The essential traits of a parser can be described as fol- 
lows. 

• for any given sentence, an initial set of items is constructed, 

• for any given grammar there is a set of rules that describe how new (larger) 
items can be computed from known items. 

All that remains to be done, then, is apply all the rules to all the items 
over and over again until all items that can be computed from the initial set 
have been computed. We see the final set of items as the result delivered 
by a parser. Some special items indicate that a parse tree exists. Hence the 
sentence is well-formed if and only if at least one of these special items is 
computed. 

A parsing schema is not an algorithm. An algorithm has a number of 
aspects that are absent in a parsing schema: 

• data structures in which computed items can be stored and efficiently 
searched for; 

• control structures^ making sure that all relevant steps are taken, in some 
appropriate order; 

• (only for parallel algorithms) communication structures^ ensuring that rel- 
evant items are exchanged between different cooperating processors. 

Each of these structures can be designed in a variety of ways, leading to 
a variety of different parsing algorithms with a single underlying parsing 
schema. It is by abstracting from these structures that the essential traits 
of very different parsing algorithms can be described in a uniform way and 
compared. 

A number of different questions come to mind. Firstly, there are some 
technical concerns. How general is the framework? The fact that all parsers 
compute intermediate results does not give any guarantee that the kinds of 
intermediate results computed by different algorithms are compatible. Sec- 
ondly, what is the relation between this framework and other parsing frame- 
works that have been published in the past? Thirdly, is there any purpose 

^ An example of a nonconstructive parser (that is in fact an enhanced recog- 
nizer) is the the LE{p,q) algorithm of Oude Luttighuis [1991], that parses a 
restricted class of grammars in logarithmic time. It makes essential use of a non- 
constructive parallel bracket matching algorithm [Gibbons and Rytter, 1988]. 
The question whether a string of brackets is well-formed is answered in logarith- 
mic time, but without giving a clue as to which opening bracket matches which 
closing bracket. 
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in writing down parsing schemata, other than an exercise in manipulation of 
formal systems? We will briefly address each of these questions. 

Different parsers produce different kinds of intermediate results. There 
are a lot of different “item-based” parsers that use a lot of different kinds of 
items. In Chapter 4 a theory of items is developed, that provides a general 
understanding of what an item is. All the various items that are used by 
different parsers can be seen as special cases of these general items. It is 
merely the notation of items that differs among parsers (and for good reason: 
in the description of a parser it makes sense to use an item notation that is 
most convenient for that particular parser). 

Not all parsers are “item-based” , however. So what about those that use 
radically diflPerent kinds of intermediate results? We will argue that every 
constructive parser is, in principle, item-based. This principle might be hidden 
from the surface and not show up in the parsing algorithm. A typical example 
is a so-called LR parser, which is based on a state transition function and 
a stack of states as the guiding structures. In this particular parser, the 
items do not appear run-time, while parsing a given sentence, but have been 
employed compile- time, in the construction of the table that encodes the 
state transition function. It is possible to partly “uncompile” an LR parser 
and show run-time at each step which items are in fact recognized. Any 
constructive parser, in similar fashion, has an underlying item-based parser 
and hence can be described by a parsing schema. 

Parsing schemata are a generalization of the chart parsing framework 
[Kay, 1980], [Winograd, 1983]. For every chart parser it is rather trivial to 
write down an underlying parsing schema, but a schema can be implemented 
by a great many algorithms that need not even remotely resemble chart 
parsers (in which case the relation between algorithm and schema will not 
be entirely trivial). One could say that the canonical implementation of a 
parsing schema is a chart parser. 

Parsing schemata are useful devices in several respects. This research was 
started with the purpose of bringing some order into the field of parallel pars- 
ing. A great variety of parallel parsers have been published in the last decade 
(cf. Alblas et al. [1994]). Although our work has shifted to a more general 
nature, quite a few of these algorithms are incorporated in the framework 
presented here. 

An interesting kind of application is cross-fertilization of different pars- 
ing algorithms with related underlying schemata. When the relation between 
algorithms is understood, most improvements and optimizations of one al- 
gorithm can easily be ported to related algorithms. An good example of 
cross-fertilization is the Parallel Bottom-up Tomita algorithm described in 
Chapter 13. A parallel version of Tomita’s algorithm is obtained in which 
the division of tasks over processors is organized radically different from the 
parallel Tomita parsers that have been formulated before. The inspiration to 
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look at the problem from a different angle came from a comparison with Ear- 
ley’s algorithm where bottom-up parallelization is simply the natural thing 
to do. 

On a more fundamental level, one can see parsing schemata as a separate, 
well-defined level of abstraction in between grammars and parsing algorithms. 
A grammar specifies implicitly what the parse trees of a sentence are. A 
parsing algorithm specifies explicitly how these parse trees can be computed. 
A parsing schema specifies which steps could be taken that guarcintee the 
construction of all parse trees, without considering data structures, control 
structures and communication structures. Such a well-defined intermediate 
level is a valuable aid because it allows a problem to be split into two smaller 
and easier problems. This is true for practical applications (the design of 
programs) as well as theoretical applications (the construction of proofs). It 
is rather more easy to prove the correctness of a parsing schema than that 
of a parser, simply because there is much less to prove. The correctness of a 
parser, then, can be established by proving that it is a correct implementation 
of schema that is known to be correct. 

It is very hard to come up with the “right” , useful abstractions and once 
you have found them, the result sometimes looks trivial. But this is usually a 
sign of being on the right track; if a complicated issue can be cast into terms 
that make it less complicated, something valuable has been gained. Parsing 
schema specifications are concise and formal, but nevertheless relatively easy 
to understand. 

That this framework is not merely a theoretical nicety but a useful ab- 
straction indeed is shown, I hope, by the many nontrivial applications of 
parsing schemata that are worked out in this book. 



1.4 Overview 

A scientific text is tree-structured. Sections can be divided into subsections; 
these into sub-subsections, and so on ad libitum. I have tried not to give 
in to this temptation and use the chapter as the main structuring element, 
following the adage 

if a subject is worth spending 50 pages on, it deserves more than a 
single chapter. 

A broad outline of the contents is given by the division into three parts: 

Part I, Exposition (Chapters 1-2) introduces the topics that will be treat- 
ed in the remaining parts. 

Part II, Foundation (Chapters 3-6) defines a formal theory of parsing 
schemata. 
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Part III, Application (Chapters 7-14) shows that parsing schemata can 
be employed for a series of different purposes. 

In more detail: 

Chapter 2 is a more detailed but informal introduction to parsing and pars- 
ing schemata. 

The basic idea underlying our work is cast into the metaphor of the 
“primordial soup” algorithm. Rather than worrying about data struc- 
tures, control structures and communication structures we throw a large 
enough supply of elementary trees into a big pot, let these float around, 
meet, interact and form larger trees, until after a very long (perhaps 
infinite) time all potential parse trees will have been formed. Schemata 
for sensible parsing algorithms can be derived by imposing various kinds 
of restrictions on this very general, but equally impractical approach to 
parsing. 

Chapters 3-6 give a theory of parsing schemata for context-free grammars. 
Most of what is done informally in Chapter 2 is done more thoroughly 
in Chapter 3. A notion of parsing schemata is developed in which partial 
parse trees constitute the intermediate results delivered by a parser. 

In Chapter 4, trees are replaced by items. An item can be seen as a 
collection of trees that share certain properties. We give two different 
definitions of items, one of a more theoretical and the other of a more 
practical nature. It is in fact very convenient to use some items that are 
inconsistent with the underlying theory, but it can be shown that this has 
no consequences for the correctness of parsing schemata. After having 
dealt with these rather fundamental issues, some examples of realistic 
parsing schemata are presented in 4.6. 

Chapters 5 and 6 discuss relations between parsing schemata. Chapter 
5 concentrates on refinement (making smaller steps and producing more 
intermediate results) and generalization (extending a parsing schema to 
a larger class of grammars). Chapter 6 deals with filtering, that is, mak- 
ing a parsing schema more efficient by discarding irrelevant parts. Both 
chapters are illustrated with lots of examples, many of them schemata of 
parsing algorithms known from the literature. In Section 6.5 a taxonomy 
of Earley-like parsing schemata is presented. 

Chapters 3-6 can be read on two levels. First and foremost, they consti- 
tute a formal theory of parsing schemata. But somebody who is familiar 
with some of the parsing algorithms that are discussed can get a fairly 
good picture of what is going on by browsing through the many examples. 

Chapters 7-9 extend parsing schemata to unification grammars. 

Chapter 7 is a short and easy to read introduction to unification gram- 
mars for computer scientists who have never had any involvement with 
computational linguistics. 
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Chapter 8 extends the formal theory of parsing schemata from context- 
free grammars to (PATR-style) unification grammars. We use a formal- 
ization of feature structures that is somewhat different from the formal- 
logical approach, but amounts to the same thing for all practical pur- 
poses. In order to be able to specify transfer of features explicitly, we 
introduce a notion of multi-rooted feature structures that describe the 
interrelations between features of arbitrary sets of objects. Thus we ob- 
tain a neat formalism for specifying parsers for unification grammars. 
For context-free grammar parsing it is pretty clear how a simple, ad- 
equate (but perhaps not the most efficient) algorithm can be obtained 
from a parsing schema. This is not the case for unification grammar pars- 
ing schemata. In Chapter 9 we discuss some essential nuts and bolts of 
unification grammar parsing: unification of feature structures, avoiding 
infinite sets of predicted items, and, last but not least, two-pass parsers 
that use some essential features in a first pass and add all other features 
in a second pass. 

For reading Chapters 7-9 one needs to have a basic understanding of the 
parsing schemata notation, but no detailed knowledge of the material 
covered in Chapters 3-6. 

Chapters 10-11 are about Left-Corner (LC) and Head-Corner (HC) chart 
parsers. These two chapters can be read as a single paper. 

An HC parser does not process a sentence from left to right; it starts 
with the most important words and fills in the gaps later. Because of the 
non-sequential way in which the HC parser hops through a sentence, its 
description is not easy, its correctness proof much less so. LC parsers are 
interesting in their own right (and the question whether LC or HC parsing 
is more efficient is still open for debate). But the main point we have to 
make about LC parsing - that it can be cast into a chart parser - has in 
fact been made already in Section 4.6. The reason to include Chapter 10 
here is that, once it is understood how an LC parser can be defined and 
proven correct, we can understand the rather more complicated HC case 
as a pretty straightforward generalization of the LC case. 

Chapters 10 and 11 exemplify that parsing schemata can be used to get 
a formal grip on highly complicated algorithms. This is the first ever HC 
parser that has been proven correct. 

Chapters 12—13 place Generalized LR parsing within our framework. These 
two chapters can be read as a single paper. 

In Chapter 12, as an example of how non-item-based parsers fit into our 
framework, we discuss Tomita’s Generalized LR parser and uncover the 
underlying parsing schema. Ignoring a few trivial details, one can say 
that this is identical to the parsing schema of an Earley parser. 
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In Chapter 13, this last insight is used to cross-breed Tomita’s parser 
with a parallel version of Earley’s parser. Test results of this so-called 
Parallel Bottom-up Tomita parser show a moderate speed-up compared 
to the original Tomita parser. 

Chapter 14 discusses parsing by boolean circuits. 

This chapter gives another, very different application of parsing schemata. 
A maximally parallel implementation of a parsing schema can be obtained 
by executing, at every step, all applicable computations at the same time. 
The control structure of such an algorithm is not dependent on the par- 
ticular sentence, hence (if we assume a maximum sentence length) the 
algorithm can be coded entirely into hardware. Any parsing schema for 
any grammar can be coded into a boolean circuit in this way. 

As a nontrivial example, we apply this to Rytter’s logarithmic-time paral- 
lel parsing algorithm. This leads to a simplification in the algorithm (and 
the proof of its correctness), while the complexity bounds of the boolean 
circuit conform to those known for other parallel machine models. 

Chapter 15, finally, gives some conclusions and prospects for future re- 
search. 
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The “primordial soup algorithm” [Janssen et ah, 1992] is a metaphor for the 
more abstract notion of a parsing schema. One specifies which trees can be 
constructed during parsing and how these can be constructed; one does not 
specify how these trees are to be searched for and stored. 

We give a very informal introduction, meant only to convey some intuition 
for what is going to be formalized in the next chapters. The reader who prefers 
formal definitions to less precise prose may skip this chapter. 

The general idea of the primordial soup approach is worked out in 2.1. 
Some primordial soup variants that resemble well-known parsing algorithms 
are introduced in 2.2; extensions and related approaches are mentioned in 2.3. 
Section 2.4, finally, gives a brief sketch of the limitations of the primordial 
soup framework and introduces the generalization to parsing schemata. 



2.1 Primordial soup 

A simple example of a parse tree is displayed in Figure 2.1. This tree gives 
a complete context-free phrase structure analysis of the sentence “the cat 
catches a mouse.” A sentence can be broken down into a noun phrase and 
a verb phrase; a verb phrase can be broken down into a verb and a noun 
phrase; a noun phrase can be broken down into a determiner and a noun. 
The word in the sentence have lexical categories determiner, noun, and verb, 
as indicated in the figure. 



S 




the 



cat 



catches a 



mouse 



Fig. 2.1. A simple parse tree 
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The parse tree is well-formed according to the following very simple gram- 
mar: 

S NP VP, 

NP *det *noun, 

VP -> *verb NP. 

This grammar, which we call Gi for further reference, consists of three pro- 
ductions or rewrite rules. The left-hand side of a production can be rewritten 
into the right-hand side (and a sentence decomposed accordingly). Grammar 
Gi has the property that it is binary branching: the right-hand side of every 
production consist of 2 symbols. This is not necessarily the case; grammars 
may also have productions with 0, 1,3, or more right-hand side symbols. But 
there are some very simple and elegant parsing algorithms that work only for 
binary branching grammars. 

Grammar Gi is extraordinarily small; a reasonable context-free phrase 
structure grammar for English contains a few hundred rules. ^ 

A slightly larger grammar G 2 , that will be used in many examples, is 
given by the following series of productions: 

S NP VP 

S -> 5 PF 
NP -> *det *noun 
NP NP PP 

PP — > *prep NP 
VP *verb NP. 

In addition to noun phrases and verb phrases, it contains prepositional 
phrases. Parsing “the cat catches a mouse” according to G 2 yield the parse 
tree that was already shown in Figure 2.1. The canonical example sentence 
that can be parsed with this grammar is “the boy saw a man with a telescope.” 
The sentence has two parse trees, reflecting the different interpretations that 
can be given to it. 

It will be obvious that grammar G 2 is also far too small to be of any 
practical use. Just as Gi its only purpose is to allow small, clear examples to 
illustrate various types of parsers. 

Prom the first example it should be clear how a parse tree can be con- 
structed for some sentence, given the productions of the grammar and the 
lexical categories of the words in the sentence. A formal definition will be 
given in Section 3.1. 

We want to design a computer system that constructs all parse trees for 
some grammar and an arbitrary string of words.^ 

^ See, for example, the context-free grammars for English as given by Sager [1981] 
and Tomita [1985]. 

^ A string of words is called a sentence only if it is well-formed according to the 
grammar. If no parse trees are found, then the string is not a sentence. 
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We Start with a very simple recipe, based upon the idea that large trees 
can be composed from smaller trees. A larger tree can be constructed by 
grafting the root of some tree onto a leaf of another tree. This can only be 
done, however, if both nodes carry the same label. We begin with an abundant 
supply of elementary trees. These come in two kinds: 

• elementary trees representing the words with their lexical categories, 

• elementary trees representing the productions of the grammar. 

As time proceeds, trees float around, meet and interact, forming larger and 
larger trees. If the sentence is well-formed, parse trees will emerge in the 
primordial soup after a long, but finite amount of time. 

Let us consider the sentence “the cat catches a mouse” again, and grammar 
Gi as on page 20. The trees that are present in the initial primordial soup are 
shown in Figure 2.2 (each different tree is shown only once, but one should 
imagine a sufficiently large number of copies of each tree). The words are 
annotated with their position in the string, so as to remember the word order. 
These trees float around and bump into other trees. Upon such a collision, 
two trees may stick together. If the root of a tree carries the same label as 
the leaf of the other tree, the first tree can be grafted onto the second one. 
The root and leaf node with the same label are merged into a single node. 
An example of tree composition is given in Figure 2.3. 

We have stated that the primordial soup contains an abundant number of 
elementary trees. Hence, as many copies of larger trees can be made as needed. 
A rather more efficient way to simulate this in a computer system is to keep 
single copies of each tree and make combinations of trees nondestructively . 
That is, the new tree is added to the current set of trees, but the trees from 
which it is constructed also remain present. Thus, in a computer simulation 
of the primordial soup, we start with an initial set of trees that contains only 
a single copy of every different kind of tree. For all possible combinations of 
trees in the set it is tried whether new trees can be produced. These new trees 
are added to the set (while the trees from which they are constructed also 
remain present). For each new tree, subsequently, all possible combinations 
with other trees are tried, and so on. This process stops if a situation is 
reached where all trees that can be produced are contained in the set already. 
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Fig. 2.2. The initial primordial soup for “the cat catches a mouse” 




22 



2. The primordial soup framework 





5 







i NP i 


\ NP\ VP 


A 


A 


*det *noun 

1 


*verb 

1 


1 

thei 


1 

catches3 



S 




NP VP 

A A 

^ ^•nnni'n 



*det 

1 


*noun *verb 

1 


1 

thei 


1 

catchess 



Fig. 2.3. A root is unified with 
a leaf of another tree 



There is no guarantee, in general, that this process ever halts, it might well 
be the case that an infinite number of trees can be created nondestructively 
from the elementary trees we started with. But we will not be bothered by 
that problem right now; for grammar G\ it is clear that the primordial soup 
will halt. 

Whether the search for new trees is done systematically or at random, 
sequential or parallel, is not of great concern to us. As the primordial soup 
framework is primarily meant to model parallel parsing, the simplest inter- 
pretation is that at each step all combinations of all trees present in the soup 
are tried for a match. ^ In this way, the number of steps that is needed until 
no more new trees can be added is the minimum number of steps in a par- 
allel implementation with unlimited resources. Such details will be discussed 
in Chapter 14, and for the next half a dozen chapters we will not be con- 
cerned with any implementation. The more interesting matter here (that will 
occupy us up to Chapter 6) is what the final contents of the primordial soup 
looks like, once every possible tree has been constructed. Which particular 
search strategy and storage structure is used to compute this final contents 
is irrelevant, as long as every tree that can be constructed eventually will be 
found. 

In the remainder of this chapter we will use a simple, linear notation for 
trees. If we have a tree with root labelled A and yield (i.e., the sequence 
of labels of leaves, from left to right) a, we may denote such a tree by the 
formula (A ^ a). The trees in Figure 2.3, for example, can be denoted by 

^ This resembles the “Unity” approach of Chandy and Misra [1988] for initial 
specification of parallel systems. 
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{NP ^ thei *noun), 

{S NP catchesa NP), 

{S ^ thei *noun catchesa NP). 

In this notation we abstract from the internal structure of the trees. Tree 
composition (here) only involves roots and leaves, hence a notation where all 
internal nodes and edges are simply replaced by the symbol is adequate 
and simple - and saves a lot of paper. Only for elementary trees we write 
rather than indicating that the yield is produced directly by the root and 
that there are no internal nodes e.g. 

{NP-^*det *noun). 

Next, we introduce an operator < that denotes tree composition. In Fig- 
ure 2.3, the construction of the large tree is licensed by the equality 

(5 NP catchess NP) <3 {NP ^ thei *noun) 

= {S ^ thei *noun catchess NP). 

A composed tree cr < r is defined only if some leaf of tree a and the root of 
tree r are labelled with the same symbol. 

If more than one leaf of cr corresponds to the root of r, then the notation 
(7 <1 r is ambiguous. Formally, the ambiguity of <1 can be eliminated by 
writing <i for tree composition with the first matching leaf, <2 for tree 
composition with the second matching leaf, and so on. A tree composition 
cr <£ r is defined only if yield{a) contains at least i occurrences of root{r). In 
most cases it is clear what is meant and we do not bother to write the index 
i. 

Note that for the first two trees in Figure 2.3 it holds that 

(S NP catchess NP) <2 {NP ^ thei *noun) 

= {S NP catchess thei *noun). 

There is a problem, however, with the tree {S ^ NP catchesa thei *noun). 
The construction of this tree is perfectly legal according to the rule of tree 
composition. But this tree can never be of any use for the construction of a 
parse tree, because the word order is violated. If we allow such trees to occur 
in the primordial soup, then not only the parse trees for the given string will 
emerge, but also the parse trees of all other strings that can be formed from 
the same words. Hence we introduce a special constraint, making sure that 
only the requested string is parsed and no other string. 

Word order constraint: the position numbers that occur as markings of 
leaves of a tree must be increasing from left to right. 



^ This convention has no particular relevance here, but anticipates a more sophis- 
ticated linear tree notation that will be introduced in Chapter 3. 
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Hence a tree {S ^ thei *noun catchesa NP) is allowed by the word order 
constraint, but a tree (5 ^ NP catchesa thei *noun) is discarded and should 
not enter into the primordial soup. As a consequence, the only full parse tree 
that eventually will appear is 

(5 thei cat2 catchesa M mouses). 



Let us now reconsider grammar G 2 (cf. page 20) which includes preposi- 
tional phrases as well. “The cat catches a mouse” can be parsed with grammar 
G 2 as well (it contains all productions of Gi), but we are faced with a prob- 
lem. An infinite number of trees can be constructed, hence (a simulation of) 
the primordial soup does not finish. Among others, the following series of 
trees will emerge: 

{NP ^ NP PP) 

{NP ^ NP *prep NP) 

{NP ^ NP *prep NP PP) 

{NP ^ NP *prep NP *prep NP) 

{NP NP *prep NP *prep NP PP) 
etc. 

The word order constraint only affects leaves that are marked with position 
numbers, i.e., words from the string. But we can continue creating leirger and 
larger trees without ever adding a single word. For grammar G 2 we guarantee 
that the primordial soup process will halt by imposing a second constraint. 

Width constraint: the yield of a tree may not be larger than a given fixed 
size. 

Which particular size is chosen is not important, the most natural choice is 
the length of the sentence.^ For any acyclic^ grammar and any string, the 
width constraint guarantees that only a finite number of different trees will 
emerge. For cyclic grammars, the depth of a tree is not bounded by the width, 
hence an infinite number of trees will be created. One could argue that this 
is right, because a cyclic grammar, in general, yields an infinite number of 
parse trees for a sentence. When all parse trees have to be delivered (and we 
do not use any sophisticated techniques to represent an infinite set of trees 
by a finite data structure) any parsing algorithm will run forever. 

Let us now define the primordial soup parser as (an abstraction of) a 
parsing algorithm without data structures and control structures. That is, 
we define 

^ For grammars with empty productions (the right-hand side has 0 S3^mbols) a 
certain oversize could be allowed; the length of the yield may shrink by adding 
an empty production to a nonterminal leaf. 

® A grammar is cyclic if a symbol A can be rewritten to A by applying one or 
more productions; otherwise a grammar is acyclic. 
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• which kind of trees may occur in the primordial soup; 

• an initial set of trees; 

• a composition rule that allows adding new trees to a given set of trees. 

Before we give the definition, a last bit of notation is needed. As shown in 
Figure 2.2, the string is represented by words annotated with their position 
and lexical category. As a general notation for this kind of initial trees we 
write where a- denotes the z-th word of string. The underlining is 

to distinguish the words proper from their lexical categories.^ If is lexi- 
cally ambiguous, there will be several initial trees; one may also find 

Definition 2.1. {Primordial soup - simple version) 

For the sake of simplicity we assume that the grammar G is acyclic and 
contains no empty productions (i.e. productions with zero right-hand side 
symbols). We set the maximum width of a tree to the length of the string 
that is to be parsed. 

The primordial soup for a grammar G and an arbitrary string of words is 
defined as follows. 

• The domain of the primordial soup comprises are well-formed trees accord- 
ing to the grammar G that obey both the word order constraint and the 
width constraint. 

• The initial set of trees contains a tree (A->a) for every production A-^a 
in grammar G and (a-^aj for every lexical category of the z-th word. 

• If trees a and r are present in the current set of trees and the tree a < t 
exists within the domain specified above, then then a <\ t may be added 
to the set of trees. 

A formal definition of well-formed trees is given later (cf. Definition 3.5). Here 
it should be clear from the examples what is meant. □ 

Implicitly defined by the primordial soup specification is the final set of 
trees. This final set, in a way, gives an account of all the intermediate results 
that are created by a parser in order to find the parse trees. How this set is 
computed (sequentially or parallel? systematically or at random?) we do not 
know at this level of abstraction. This is the central idea. 

More restricted versions of the primordial soup - in which the final set 
contains only those intermediate results that are computed by a sensible 
parsing algorithm - can be defined by 



^ For a natural language parser, it is much more convenient to start parsing from 
the lexical categories of the words, rather than the words themselves. So, in 
most descriptions of parsing algorithms, the symbol ai denotes a lexical category, 
rather than a “real” word. 
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• restricting the domain of trees that is allowed to occur in the primordial 
soup 

• adding restrictions to tree composition operators. 

These two kinds of restrictions are usually interchangeable. In the above 
version of the primordial soup, for example, the domain excludes trees that 
violate the word order constraint or width constraint. We could have given 
an alternative definition in which the domain of the primordial soup simply 
consists of all well-formed trees but the tree composition rule is defined only 
for those cases where none of the constraints if violated. The definition is 
different but the final set of trees that is implied by the definition is the 
same. 



2.2 Restricted versions of the primordial soup 

Above we have defined the most general but also most inefficient variant 
of the primordial soup. Even for small grammars and small sentences, the 
final set of trees will be huge. We will now give a few examples of more 
efficient variants of the primordial soup. The reader who is familiar with 
parsing theory will recognize that these more sensible versions are related 
to the algorithms of of Cocke- Younger-Kasami (CYK) and Earley. We also 
give a version of Rytter’s algorithm, which is rather hard to comprehend in 
its original form, and rather more easy to understand in the primordial soup 
format. 

Before we define further variants of the primordial soup we have to be 
more specific about the terminology. 

We write a, 6, ... , for lexical categories; 

we write A, ^, . . . , for nonterminals (i.e., other syntactic categories), 
we write X, Y, . . . , for symbols for which it does not matter whether they 
refer to a nonterminal or to a lexical category; 

we write for the i-th word of the sentence; is called a marked terminal] 
we write a,/3, ..., for strings of nonterminals, lexical categories and/or 
marked terminals; 
we write e for the empty string. 

Furthermore, we define the following species of trees. 

• A complete tree is a tree of the form {A^ . . .a-). 

• A production tree is a tree (A->a) with A->a a production of the grammar. 

Schematic drawings of a complete tree and a production tree for a binary 
production are shown in figure 2.4. A special subspecies of complete trees is 
worth mentioning. 

• A terminal tree is a tree of the form (a->a^). 
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Fig. 2.4. A complete tree and a (binary 
branching) production tree 
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In the CYK version of the primordial soup, only complete trees are con- 
structed. The initial set contains production trees and terminal trees (but 
terminal trees are a subspecies of complete trees). Hence we limit the do- 
main to production trees and complete trees. We will assume here that the 
grammar G is binary branching, i.e., every production has two symbols at 
its right-hand side. Both grammars Gi and G 2 as defined in Section 2.1 are 
binary branching. 

Suppose that we have a production tree {A-^BC) and that we have com- 
plete trees . . .a-) and F^orn these we can con- 
struct a larger tree {A ^ . . . a^). But, since we have restricted the allowed 

types of trees to production trees and complete trees, putting these 3 trees 
together must be done in a single operation. If the construction is done in 
two steps, the intermediate product belongs to a species that is not allowed 

within the domain. Hence we replace the binary composition operator by a 

3 

ternary composition operator denoted <3, as follows: 

3 

7T <1 (j, r is defined for binary production trees tt and complete trees cr, r 

if yield{7r) = root (a) root (t). 

3 

TT <1 cr, r denotes the tree that is constructed by grafting a onto the first 
(left) leaf and r onto the second (right) leaf of tt. 

Example 2.2. {Primordial soup - CYK version) 

Let G be a binary branching grammar. The CYK version of the primordial 
soup for G and an arbitrary string of words is defined as follows. 

• The species of trees in the domain are restricted to production trees and 
complete trees. 

• The initial set of trees contains a tree {A-^XY) for every production 
A-^XY in grammar G and (a-^aj for every lexical category a of the 
z-th word. 

3 3 

• If trees are in the current set and tt < a, r is defined, then tt < a, r 

may be added to the set. 

The final set for “the cat catches a mouse”, according to grammar G 2 , is 
shown in Figure 2.5. □ 

As a second example, we will make a minor variation to the CYK version 
of the primordial soup. This allows us to define (an abstraction of) of Rytter’s 
algorithm [Rytter, 1985], [Gibbons and Rytter, 1988], which, in its original 
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production trees: {S-^NP VP) 

{S-^S PP) 

{NP-^*det *noun) 

(NP->NP PP) 

( VP-> *verb NP) 

(PP-> *prep NP) 

terminal trees: {*det-^thei) 

{*noun-^cat2) 

{*verb -^catchess) 
l*det-^a4) 

( *nown— J^mouses) 

other complete trees: (iVF thei cat2) 

{NP^a^ mouses) 

{VP ^ catchess 34 mouses) 
(S-^thei cat2 catchesa a4 mouses) 

Fig. 2.5. The final set of trees in a CYK primordial soup 



form, is much more difficult to understand than CYK. We define an additional 
species of trees: 

An almost-complete tree is a tree of one of the following forms: 

(A-X), 

...fij), 

. ..g,jX), 

{A . . . ajXa^+i • • • S<) 

with i < j < k < I, where applicable. 

An almost-complete tree contains exactly one leaf that is not a marked ter- 
minal. If the grammar is binary branching, trees of the form {A X) do not 
exist. 

When we extend the domain with almost-complete trees, the tree con- 
struction operator of CYK can be simplified. Suppose, again, that we have a 
production tree tt = {A-^BC) and complete trees a = • • •%) 

T = {C a . . .Ofe). Then it clearly holds that 

3 

TT < cr, r = {tt <] a) <T = (tt <1 r) < a. 

Both intermediate results 

(tt < O') = ...a^C), 

(7 T<]r) = {A^Baj^^...ak) 
are almost-complete. 

By allowing almost-complete trees and binary tree composition we have 
created another possibility to obtain new trees. If the set of trees contains, 
for example 
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VP 
VP 

A < 

*verhNP 

34 mouses *verb 34 mouses 

(z) a production tree and a complete tree yield an almost-complete tree 





(n) two almost-complete trees yield an almost-complete tree 




[Hi) an almost-complete and a complete tree yield a complete tree 
Fig. 2.6. Some tree compositions according to Rytter 



{A Ch+i . . . . . . a„), 

. . •ajCaj.+i ■■■&() 

then these can be combined into a third almost-complete tree 
{A^ ah-^i • • • • *^m)* 

Hence three types of tree construction can take place, which can be classified 
according to the species of trees involved, as follows: 

• a production tree and a complete tree are merged into an almost-complete 
tree, for example (cf. Figure 2.6(2)): 

{VP-^*verb NP) < (iVP^ 34 mouses) = {VP ^ *verb 34 mouses) 
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• an almost-complete tree and an almost-complete tree are merged into an 
almost-complete tree, for example (cf. Figure 

{S thei cat2 VP) < {VP ^ *verb 34 mouses) 

= {S thei cat2 *verb 34 mouses) 

• an almost-complete tree and a complete tree are merged into a complete 
tree, for example (cf. Figure 2.6 (m)): 

(5 ^ thei cat2 *verb 34 mouses) < ( *verb -^catchess) 

= {S thei cat2 catches3 34 mouses) 

A description of Rytter’s algorithm typically defines three operators that 
correspond to the three cases of tree combination outlined here. In the pri- 
mordial soup version, these three operators need not be defined explicitly; 
they are a consequence of the domain definition and the general composition 
rule based on <]. 

Example 2 . 3 . {Primordial soup - Rytter version) 

Let G be a binary branching grammar. The Rytter version of the primordial 
soup for G and an arbitrary string of words is defined as follows. 

• The species of trees in the domain are restricted to 

- production trees, 

- complete trees, 

- almost-complete trees. 

• The initial set of trees contains a tree {A^XY) for every production 
A-^XY in grammar G and {a-^a^) for every lexical category a of the 
z-th word. 

• If cr, r are in the current set and cr <] r is defined within the domain then 
C7 <] r may be added to the set. 

The final set of trees for our simple example sentence and grammar G 2 is 
shown in Figure 2.7. □ 

Rytter’s algorithm can compute all parses very fast, at the expense of 
rather large number of resources.® In Chapter 14 we will show how this algo- 
rithm can be implemented as a boolean circuit. 

Next we turn to Earley’s algorithm.® The grammar does not have to be 
binary branching and can be any context-free grammar. But the primordial 
soup stabilizes into a final state only if the grammar is acyclic. For cyclic 

® For a sentence of length n, the final set of trees is computed in G(log n) steps, 
using 0(n®) processors on a parallel random access machine. 

® Note, however, that this is the bottom-up version of Earley, in which the predict 
operator is absent. Parsing can be started at any position in the string, inde- 
pendently of the left context. In Chapter 4 we will define a parsing schema for 
the conventional Earley algorithm, proceeding left-to-right and making use of 
top-down prediction. 
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production trees: as in Figure 2.5 

terminal trees: as in Figure 2.5 

almost- complete trees: (NP ^thei *noun) 

(NP ^ *det cat 2 ) 

{NP thei cat 2 PP) 

{NP ^ 34 *noun) 

{NP ^ *det mouses) 

{NP 34 mouses PP) 

{VP ^ catchesa NP) 

{ VP ^ catchesa 34 *noun) 

{ VP catchesa *det mouses) 

( VP ^ *verh 34 mouses) 

(5-^thei cat2 VP) 

{S ^ thei cat 2 catchesa NP) 

(S'^thei cat 2 catchesa 34 *noun) 

(5^ thei C 3 t 2 C3tchesa *det mouses) 

(5 thei cat 2 *verb 34 mouses) 

(5^ NP C 3 tchesa 34 mouses) 

(5^ thei *noun catchesa 34 mouses) 

{S ^ *det C3t2 C3tchesa a4 mouses) 

(5-^ thei C3t2 catchesa a4 mouses PP) 

{non-terminal) complete trees: (^P^thei cat 2 ) 

{NP 34 mouses) 

{ yp ^ catchesa a4 mouses) 

(5 thei cat 2 catchesa a4 mouses) 

Fig. 2.7. The final set of trees in a Rytter primordial soup 

grammars, there is no final state and an infinite number of trees will be 
created, including the (generally) infinite number of parse trees for a sentence. 
For the Earley version of the primordial soup we define another species of 
trees. 

An Earley tree is a tree {A . . . a-^) having subtrees n , . . . , rt such 

that A-^root{ri ) . . . root{Tk)/3 is a production of the grammar, and 
yield ( ti) . . . yield {rk) = Ui+i . • 



A 




Fig. 2.8. An Earley tree 
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production trees: 
terminal trees: 
Earley trees: 

{excluding production trees 

and complete trees) 



{non-terminal) complete trees: 



as in Figure 2.5 
as in Figure 2.5 

(iVP'^thei *noun) 

(iVP'^thei cat2 PP) 

{NP 34 *noun) 

{NP ^ 34 mouses PP) 

{ VP catch es3 NP) 

( 5 ^thei cat2 VP) 

(5 thei cat2 catchess 34 mouses PP) 

(NP ^ thei cat2) 

(NP 34 mouses} 

( VP ^ catchesa 34 mouses) 

( 5 thei cat2 catchesa 34 mouses) 



Fig. 2.9. The final set of trees in an Earley primordial soup 



A general sketch of an Earley tree is shown in Figure 2.8. The Earley tree 
has two important subspecies that have been defined already: 

a production tree is an Earley tree with fc = 0; 
a complete tree is an Earley tree with 0 — e. 

If both cr and a <] r are Earley trees then r must belong to the subspecies 
of completed trees. Thus, if a new tree a <3 r is added to the set of trees, we 
can distinguish two cases: 

^ fit+i • • • O (a->-%+i> = 

which is called scan by Earley, and 

< {B g,j+^ . . . a^) = . . .a*,/?) 

which is called complete. 

As with Rytter, the division into operators involving different types of 
trees need not be specified explicitly. It is a consequence of the restriction on 
the domain. 

Example 2.4. {Primordial soup - Earley version) 

Let G be an arbitrary context-free grammar. The Earley version of the pri- 
mordial soup for G and an arbitrary string of words is defined as follows. 

• The domain is restricted to Earley trees. 

• The initial set of trees contains a tree (A-^a) for every production A-^a 
in grammar G and (a->a^) for every lexical category a of the z-th word. 

• If cr, r are in the current set and a < r is an Earley tree then a <] r may 
be added to the set. 



The final set of trees of our example sentence and grammar G 2 is shown in 
Figure 2.9. □ 
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2.3 Extensions and related formalisms 

The only way in which trees can be merged, so far, is by unifying a leaf 
of one tree with the root of another. More complicated merges could be 
allowed as well. In Figure 2.10 an example of a merge is shown in which 
larger overlapping parts, rather than single nodes, are combined so as to 
create a larger tree. 

For most algorithms such merges are not necessary (and hence, for the 
sake of efficiency, should better not be considered). If a tree can be created 
by a complicated merge, the same tree can be created by simple leaf-to-root 
merges from the same elementary material that was present in the initial 
primordial soup. Janssen et al. [1991] give an example of a primordial soup 
variant that makes essential use of other than leaf-to-root merges. This vari- 
ant describes the parsing algorithm of De Vreught and Honig [1989]. The 
basic idea is the following: 

Suppose there is a production tree (A— >a/3i^27)- A tree may emerge with 01 
fully expanded; say 

{A -w ...Sj/327)- 

If, at some moment in time, the primordial soup also contains a tree 
{A .--akl) 

in which 02 has been fully expanded, these trees can be merged into a single 
tree 



{A asi+i ...a;t7). 

The algorithm of de Vreught and Honig will be treated extensively in Chap- 
ter 6, hence we don’t go into more detail here. 

An operation on trees that fits very well to the primordial soup metaphor 
is tree adjoining. A special kind of tree, called an adjunct^ is inserted in 
the middle of another tree. This is illustrated in Figure 2.11. A tree is “un- 
merged” into two trees by splitting a node into a leaf of the outer tree and 
the root of the inner tree. Then the root of the adjunct is unified with the 
cut leaf of the outer tree and the root of the inner tree is unified with a leaf 
of the adjunct. An adjunct can be any tree that has a leaf carrying the same 
label as its root. This leaf is called the foot of the adjunct. 

Tree adjoining grammars (TAGs), defined by Joshi et al. [1975, 1991], 
are perhaps most easily described in the primordial soup framework. For the 
construction of a parse trees in a TAG two kinds of operations can be used: 
composition^ which is identical to our leaf-to-root merging and tree adjoining 
as explained above. Furthermore, the nodes in the initial trees may carry 
labels that describe whether adjoining over that node is forbidden, mandatory 
or optional. In a Lexicalized TAG [Schabes and Joshi, 1991], moreover, it is 
demanded that every elementary tree contains at least one terminal. If there 
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Fig. 2.10. A merge over corresponding subtrees 



is a lexicon that provides elementary trees for every word, then this implies 
that the entire grammar is contained in the lexicon. 

The primordial soup is not the first chemical metaphor for computation, 
or, more specifically, parsing. A “Chemical Abstract Machine” is defined by 
Berry and Boudol [1990] as an abstract model of asynchronous concurrent 
computation (a better name would have been an “Abstract Chemical Ma- 
chine”). There are two kinds of chemical reactions to create compounds: the 
first one is reversible, compounds may spontaneously decompose again. Com- 
position is irreversible when two ions with different valencies meet. 

A chemical metaphor in parsing is the “test-tube model” used by Kempen 
and Vosse [1990]. The purpose here is to create a single parse tree (the most 
likely one) for a given sentence. It is essential that composition is destructive, 
i.e., a molecule that is initially present can be used only in one compound 
at the time. Compounds that do not find other material to react with will 
decompose after some time. 

Willems [1992] uses chemical composition as a metaphor for the semantics of 
natural language described by means of knowledge graphs. 
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Fig. 2.11. Tree adjoining 



2.4 From primordial soup to parsing schemata 

We have introduced the primordial soup as a metaphor for parsing schemata. 
In Chapters 3 and 4, parsing schemata are introduced in an abstract and 
rather more formal manner. Some of the technical details differ, but the 
general idea is identical. One specifies 

• a set of objects that constitute a domain 

• an initial set of objects 

• rules that allow a set of objects to be extended with new objects. 

Implicitly specified by such a schema is a set of valid objects; the subset of 
the domain that can be derived from the initial set following the rules. This 
set of valid objects may be finite or infinite. How such a set can be computed 
and stored is not relevant at this level of abstraction. 

The primordial soup metaphor strongly suggests that derivation of trees 
is compositional; a new tree is created by merging separately existing trees 
into a single structure. It violates the laws of chemistry when a tree can be 
derived from existing trees with which it has nothing in common. Similarly, 
the metaphor does not allow that trees are created spontaneously out of 
nothing. Yet it is rather easy and sometimes convenient to introduce a unary 

composition operator < which states that r can be added to soup - irrespec- 
tive of its current contents - if <] r holds. Such constructs are at odds with 
the intuition presented here, but turn out to be useful for the specification 
of parsing schemata. 

An example of a rule that does not quite fit the primordial soup metaphor 
(and which we have carefully circumvented above) is the predict operation in 
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Earley’s algorithm. In Chapter 4 this will be treated properly. Hence, like any 
metaphor, the primordial soup metaphor is very useful to convey a superficial 
intuition but does not fit quite so well when one digs deeper into the theory. 

Nevertheless, the general idea of parsing schemata as an intermediate 
level of abstraction between grammars and algorithms has been clarified suf- 
ficiently by the examples given above. All that is left is to work out the formal 
and practical details. 




Part II 



FOUNDATION 




3. Tree- based parsing schemata 



The primordial soup algorithms of Chapter 2 served to provide some intu- 
ition of what parsing schemata do. We specify a domain of trees, an initial 
set of trees and deduction steps that allow to add new trees to a current set 
of trees. Control structures and data structures must be added to turn these 
specifications into sensible algorithms (and, for parallel algorithms, commu- 
nication structures as well). We will now develop a formal theory of what 
we have been doing informally. Furthermore (as we have argued in Section 
2.4), the primordial soup metaphor carries some connotations that should 
not restrict the kind of parsing schemata that we intend to define. 

A full-fledged parsing schema has a set of items, rather than trees, as 
its domain. But in order to develop a general theory of item-based parsing 
schemata one must first have a notion of what an item is. We will tackle one 
problem at the time. In this chapter we give a formal treatment of parsing 
schemata based on trees. In Chapter 4, subsequently, we will investigate the 
notion of an item and add that to the formalism. Having defined a general 
formalism for parsing schemata, we will study in Chapters 5 and 6 how dif- 
ferent parsing schemata are related and how schemata can be transformed 
into other schemata. 

In Section 3.1 we recall the notion of a context-free grammar and related 
standard definitions in parsing theory. The reader who is familiar with this 
theory should still glance through it; notations differ a lot in the literature, 
and here we introduce notational conventions that are used throughout the 
remainder of this book. Furthermore, a practical linear notation for trees is 
introduced (this is an extension of the notation already employed in Chapter 
2 ). 

A tiny extension to the standard theory of context-free grammars is made 
in 3.2. If one constructs a parse, this involves two different kinds of operations: 
constructing trees and verifying that leaves of these trees match words in the 
sentence. In a parsing schema we want to do everything in terms of trees; 
hence matching that a predicted word does indeed occur in the string will 
be defined as a tree operation as well. This extension is needed for a formal 
theory of parsing schemata but, as we will see in subsequent chapters, hardly 
relevant for the description of schemata of practical algorithms. 
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In 3.3 we define logical deduction systems. A parsing system for a given 
grammar and a given string is just such a deduction system. A parsing 
schema, then, is a more abstract object that can be instantiated to a parsing 
system by providing it with a grammar and a string. Parsing schemata are 
introduced in Section 3.5. 

An interesting property of parsing schemata is correctness. One should be 
able to investigate whether a given parsing schema deduces the right parse 
trees (and only those). To that end, we define enhanced deduction systems in 
3.4, for which an appropriate notion of syntactic correctness can be expressed. 
Parsing systems in 3.5 are defined as enhanced deduction systems. 

The use of a logical system as an abstract notation for a chart parser (cf. 
Chapter 10) is due to Pereira and Warren [1980, 1983]. Parsing schemata have 
some deep relation with their work, but the emphasis is rather different. While 
the “Parsing as Deduction” approach is primarily interested in connecting the 
parsing logic with unification-based grammar formalisms, parsing schemata 
use deduction merely as a convenient notation in which we can describe 
context-free parsing algorithms.^ Based on this notation, we can formally 
define and investigate relationships between parsing algorithms. 



3.1 Context-free grammars 

We recall some standard notions of formal language theory (see, e.g., [Harri- 
son, 1978], [Sippu and Soisalon-Soininen, 1988]) that will be used throughout 
the remainder of this book. In addition, we introduce a convenient linear 
notation for trees that is somewhat more powerful than the notation used in 
Chapter 2. 

Definition 3.1. {strings) 

Let X be an arbitrary set. We write for the set of non-empty strings 
xi . . .Xk, {k >1) over X. 

We write X* for the set of strings xi ...Xk^ {k > 0) over X. For j = i - 
the sequence X{. . . xj denotes the empty string. For j < z — 1 , the notation 
Xi. .. Xj is undefined. □ 

Definition 3.2. {context-free grammar) 

A context-free grammar (CFG) is a 4-tuple G = {N, i7, P, S) satisfying 

(z) the set of nonterminals N and the set of terminals S are alphabets 
taken from some universal class of symbols Sym, N C\ X — 



^ In Chapter 8 we also study unification grammar parsing, but note that we are 
interested in arbitrary parsing algorithms for a simple (PATR-based) unification 
grammar formalism, whereas, e.g., Shieber [1992] describes a single (Earley) 
parsing algorithm for arbitrary unification grammars. 
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(n) the set of productions P consists of a finite number of pairs {A, a) with 

{Hi) the start symbol 5 is a nonterminal symbol from N . 

We write CPQ for the class of context-free grammars. □ 

Definition 3.3. {notations) 

{i) We write V for U E. 

{a) Productions {A^a) are written as A->a. 

{Hi) We write 

A, B,C,. . . for variables ranging over N] 

X, y, . . . for variables ranging over V ; 
a,b,. . . for variables ranging over E; 
v,w,x, . . . for variables ranging over E*\ 
a, /?, 7, . . . for variables ranging over V * ; 

the empty string is denoted by e. 

A string that is to be parsed is usually denoted ai . . . On- 
{iv) The relation ^ on F* x V* is defined by 
a => /? if there are ai, a2, A, j such that 

a = ai Aq2, P = ai702 and A-47 € P. □ 

Using the notational conventions introduced in (m), we need not state from 
which set an element is taken when we talk about some (arbitrary) a, A, a, 
. . ., making the notation a little less burdensome. This practice has already 
been adopted in {iv). 

The relation is used mainly in combination with the transitive or the 
transitive and reflective closure, denoted resp. 

Definition 3.4. {subclasses of CTQ) 

We can define several useful subclasses of CTQ., the class of context-free 
languages. Often used subclasses are acyclic CFGs and e-free CFGs. In part 
II we only use one subclass: grammars in Chomsky Normal Form. 

• A context-free grammar G is in Chomsky Normal Form (CNF) if P con- 
tains productions of the form A-^BC and A-^a only. 

We write CAfP for the class of grammars in Chomsky Normal Form. □ 

Definition 3.5. {trees) 

Let U be the class of finitely branching finite trees in which children of a 
node have a left-to-right ordering, and every node is labelled with a symbol 
from Sym. For G = {N, E,P,S) G CFG, the set Trees{G) C U is the set of 
trees with labels in U P U {e}, in which every node u satisfies one of the 
following conditions: 

• u is a leaf; 

• u is labelled A, the children of u are labelled Xi , . . . , Xn and there is a 
production A->Xi • • • Xn G P; 
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• u is labelled A, u has one child labelled e and there is a production A-^e E 

P, 

We write , for tree variables. □ 

Definition 3.6. {root, yield) 

For G E CPG and r E Trees {G) we define 

root{T) is the label of the root of r; 

yield {t) is the string that is obtained by concatenating the labels of all 
leaves of r in left-to-right order. □ 

The leaves of r are labelled with symbols from V U {e}. The yield is a string 
in as the empty string symbol e disappears in concatenation. Only if all 
leaves are labelled e then yield{r) is the empty string. 

Definition 3.7. {parse tree) 

A tree r E Trees (G) is called a parse tree or a parse for a string ai . . . On if 
root{r) = S and yield (r) = ai .. .an- 

A string in U* is called valid with respect to G if it has a parse tree. A valid 
string is also called a sentence. □ 

We introduce a convenient, linear notation for trees that will be used 
throughout the remainder of this book. 

Definition 3.8. {linear tree notation) 

An arbitrary tree with root A E N and yield a E F* is denoted (A^a); 
see Figure 3.1(a). Note that, in general, there are many trees satisfying these 
conditions (if we want to be more specific about the structure of the tree, we 
can use nested expression as introduced below). As a special case, we write 
(A-4a) for a tree that has a root and a sequence of leaves, but no intermediate 
nodes. Thus a tree (A— >a) corresponds to a single production A->a E P, see 
Figure 3.1(b). 

We also use nested expressions for trees. The expression 

{A'^a{B ^ P) 7 ) 

denotes a tree (A ^ that can be constructed by replacing the leaf B in 
a tree {A^aBp) by a subtree {B^ P). See Figure 3.1(c). As a convenient 
shorthand, a tree 

{A-^a{Bi-^ Pi) ••• {Bn-^ Pn)l), 
as shown in Figure 3.1(d), will be denoted by 
{A^a{Bi -'Bn^Pl-'Pn)l)- 

We write {A^a{P^^) 5) if there is a series of n subtrees n , — ,Tn such 
that P = root{r\) • • • root{Tn) and 7 = yield {t\) • • • yield {Tn)- Occasionally it 
will be convenient to use this notation for n = 0. It evidently holds that 
{A^a{£''^e) P) = (A ^ ap). □ 
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3.2 Some small extensions to context-free grammars 

We introduce a small nonstandard extension to context-free grammars. This 
is needed for the formal definition of tree-based parsing schemata in 3.5 but 
hardly relevant for the following chapters. 

At the end of this section we have a closer look at the status of pre- 
terminals in natural language parsing. 

The work that has to be done in parsing a sentence is mainly, but not 
exclusively, concerned with constructing (parts of) parse trees. We also have 
to verify that the constructed (partial) trees do indeed derive (part of) the 
sentence we want to have parsed. 

Consider the following hypothetical example. We have a sentence abcde, 
and the grammar contains a production A—^abc. From reading the first a 
we may conclude that the production A-^abc could apply here, so we add 
(A-Aabc) to the set of partial trees that could contribute to the construction 
of a parse for abcde. The next two steps then would be to verify that indeed b 
is the second and c is the third word of the sentence. Only after having done 
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so, we may conclude that {A-^abc) is in fact a parse tree for the subsentence 
abc. If we parse another sentence abd . . we will also conjecture that (A-^abc) 
is a partial parse, but this time it will be disqualified when we read the third 
word of the sentence. 

Prom this example it is clear that it makes sense to introduce some no- 
tation indicating which leaves of a tree are truly part of the sentence and 
which leaves are only conjectured and have to be verified still. A standard 
solution is to make a difference between expanded and unexpanded leaves of a 
tree. If we indicate expansion of leaves by underlining, the fact that A-^abc 
is a partial parse for the first part of the sentence would be established by 
deriving a sequence of trees 

(A — Vohc)^ (A — ^ oh c ) , (A — y abc ) . 

We will use a slightly more subtle scheme, however, in which nodes are not 
simply expanded but expanded to a particular position in the sentence. This 
rules out any ambiguity, for example when expanded leaves labelled with 
terminals are separated by an unexpanded leaf labelled with a nonterminal. 
Furthermore, we can denote the notion “a occurs at position j in the sen- 
tence” by a particular kind of tree (it is this tree with which a terminal leaf 
is expanded). Hence the entire parsing process can be described in terms 
of tree manipulation. This is precisely what we have done informally in the 
Primordial Soup approach in Chapter 2. 

At the same time we introduce a notational convention that is used by 
many parsing algorithms: the end of the sentence can be indicated by an end- 
of -sentence marker^ usually denoted $, which is added to a string ai . . . On 
cts the (n 4- l)-st symbol. Similarly, we may sometimes use a heginning-of- 
sentence marker., denoted #, with is added to a string as the 0th symbol. It 
is assumed that #, $ 

Definition 3.9. {marked terminal) 

For every G G C!FQ a marked terminal is a pair {a,j) G (i^ U {#, $}) x N. 
We usually write rather than (a, j). We also write £ for {#, $}) x N. 

□ 

The natural number j is used to indicate the position of a word. For each 
word in the sentence aj we will create a special tree {a-^Oj). The sentence 
can now be represented as a set of trees, rather than a string of symbols. The 
initial set of trees for ai . . . thus is 

{a-^OLj 1 a is the j-th word of the sentence} 

The beginning of the sentence in the above example is now parsed as follows. 
From (A-^abc) and (a->a 2 ) we obtain (A ^ Oibc). With {b->b 2 ) this com- 
bines to (A 'w a^b 2 c), and so on. In this way we have replaced the concept 
of expanding a terminal by combining trees. 
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If needed, the end-of-sentence symbol may be represented by a tree 
and the beginning-of-sentence symbol by a tree (#“^^)* Sec- 
tion 3.5 we will argue that the end-of-sentence marker is a necessary ex- 
tension, needed to infer that there is no word beyond position n, while the 
beginning-of-sentence marker is merely a notational convenience. The sen- 
tence, by definition, starts with word number 1 . 

In order to get things formally right, we have to extend some definitions. 

Definition 3.10. {extension of Definitions 3,2, 3.3, 3.5, and 3.7) 

(z) A pseudo-production is a pair (a, (a, j)) with (a, j) a marked terminal. 
We usually write rather than (a, (a, j)). We write P for the set of 

pseudo-productions for a particular grammar. 

{ii) The variables a, /?, 7 , J, . . . may range over N U D U 
{Hi) The class of trees Trees{G) is extended to cover pseudo-productions as 
well. That is, 

• nodes carry labels from A' U Z* U £ U {e}, 

• in addition to the three alternatives in Definition 3.5, a node u may 
be labelled with a terminal a and have a single child labelled Oj for 
some j. 

{iv) A tree r G Trees {G) is called a marked parse tree or marked parse for a 
sentence ai . . . an if root{r) = S and yield{r) = . . .Q^. □ 



Definition 3.11. {set of marked parse trees) 

The set of marked parse trees for a given context-free grammar G and 
all strings of length n is defined by 

= {re Trees (G) | 3ai . . . a„ e T* : 

root{r) = S A yield {r) = . . . a^}. 

The set of marked parse trees Vg{q>\ ... Un) for a given context-free grammar 
G and a particular string ai . . . Un is defined by 

Vg{ch • • • ttn) = {t G Trees{G) | root{r) = S A yield{r) = . . . a^}. □ 



A variety of parallel parsing algorithms can be formally expressed in terms 
of trees and operations on trees only. In order to cover the familiar sequential 
algorithms as well, we have to make another slight extension. One needs 
to express the fact that a tree with no marked terminals should expand 
downwards only to a particular position in the sentence. 

Definition 3.12. {left- and right-marked trees) 

A left-marked tree is a pair (z,r) G N x Trees{G). We usually write i : r 
rather than (z,r). 

A right-marked tree is a pair (r, z) G Trees{G) x N. We usually write r : z 
rather than (r, z). □ 
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Note that it is conceivable that trees could have a marking at some other 
position in the yield, rather than leftmost or rightmost. We will not formally 
define these; they will not be used in this book. 

In formal language theory we distinguish between two kinds of symbols: 
terminals and nonterminals. In the analysis of natural language it is more 
convenient to distinguish three kinds of symbols: nonterminal categories, lex- 
ical (also called pre-terminal) categories, and words. From a formal point 
of view, it is clear that the words are terminals and that the pre-terminals 
are a subset of the nonterminals. In parsing algorithms, however, it is much 
more convenient to consider the lexical categories as terminals, and simply 
forget about the words. This has the advantage that the size of the grammar 
is reduced dramatically. A natural language grammar typically has not more 
than a few dozen lexical categories, while a dictionary may contain many 
thousands of words. A minor disadvantage is that a word may fall into dif- 
ferent lexical categories, hence the (pre-)terminals in the string to be parsed 
are not uniquely defined. But for our theory of parsing schemata this is not 
a problem at all. The lexical categories of the words will be represented as 
hypotheses, and there is no objection to having different hypotheses about 
a single word. Moreover, if we regard lexical categories as terminals, than 
a marked terminal, as introduced above, can be seen as a lexical category 
annotated with a word at some position in the sentence. 

Lifting terminals from real words to lexical categories causes an anomaly 
for grammars in Chomsky Normal Form. Consider again grammar G 2 

S NP VP, 

5 -> 5 PP, 

NP -> *det *n, 

NP NP PP, 

VP *v NP, 

PP — > *prep NP. 

If we regard *n, *v, *det and *prep as pre-terminal nonterminals this gram- 
mar is in CA/’P’. If we regard the lexical categories as terminals, on the other 
hand, the definition of Chomsky Normal Form must be adapted. In Chapter 
2 we have avoided the issue by calling such grammars binary branching. In 
Chapters 3-6, where we develop a formal theory of parsing schemata, we 
will stick to the formal definition of Chomsky Normal Form as presented in 
Definition 3.4. 

In sum, whether lexical categories are treated by a parser as terminals 
or as preterminal nonterminals is not relevant for the theory of parsing 
schemata. 



^ Following standard convention, we abreviate *noun and *verb to *n and S. 
Sometimes, when there are space restrictions (in many figures) we also abbreviate 
*det and *prep to *d and *p. 




3.3 Deduction systems 47 



3.3 Deduction systems 

The general concept of a deduction system, as we will present it here, con- 
forms to deduction systems as they are known in mathematical logic. The 
details of our definition are somewhat idiosyncratic, however. Here we present 
deduction systems in a way that facilitates easy definitions of derived con- 
cepts in subsequent sections and chapters. 

A deduction system contains an arbitrary set of objects, called entities. 
The purpose of a deduction system, in a narrow sense, is that it allows to 
establish which entities are valid. From an initial set of hypotheses.^ by means 
of a set of deduction steps ^ the validity of entities can be deduced. 

The word “entity” is not supposed to mean anything, other than an iden- 
tifiable object. When we come to parsing systems and parsing schemata, these 
entities will be trees (in this chapter) or items (in the next chapters) that 
are employed by some chart parser. Note that the term “item” is, in general, 
equally void of meaning. We will give it a precise meaning in the context of 
parsing schemata in Sections 4.3 and 4.4. 

The initially valid entities in a logical deduction system are usually called 
axiomata. We use the word “hypothesis” on purpose, because it suggests 
truth of a much more volatile nature than an axiom. In a deduction system 
that is (an abstraction of) a parser for a particular grammar, the entities and 
deduction steps are fixed; the hypotheses vary according to the string that is 
to be parsed. 

Finally, where a deduction system conventionally has a set of inference 
rules, each rule having its own arity, we lump these together into a single set 
of inferences, called deduction steps. 

Definition 3.13. {deduction step, antecedent, consequent) 

Let AT be a set of entities, H a set of hypotheses. A deduction step is a pair 
{Y, x) with Y C H U X a, finite set and x € X. 

We write p{Z) for the power set (i.e. the set of subsets) of any Z. We write 
pfin{Z) for the set of all finite subsets of Z. Hence a deduction step {Y,x) is 
an element of the set pfin{H U AT) x X. 

In a deduction step ({t/i, • . . , 2 /^}, a:), the entities yi,-..,yk are called the 
antecedents and x is called the consequent of the deduction step. □ 

Definition 3.14. {deduction system) 

A deduction system D is a triple (X,H,D), with 

X a set of entities, called the domain of D; 

H a set of hypotheses; 

D C pfin{H U X) X AT a set of deduction steps. □ 

The astute reader will be astonished, perhaps, that H is not necessarily a 
subset of X. It seems rather more natural to assume H C X, and D C 
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pfin{X) X X. The reason for this idiosyncratic definition is pragmatic. It does 
not do any harm to the theory when some (or all) hypotheses are outside 
the domain of the deduction system. A minor nuisance is that we have to 
write HUX rather than X. The specification of realistic parsing schemata is 
simplified, in fact, by assuming the hypotheses to be outside the domain. In 
the Examples 3.19 and 3.20 that will follow shortly, hypotheses are contained 
in the domain as one would normally assume. 

It should be noted that a deduction step may have zero antecedents. If 
(0, x) £ D then x can always be deduced, regardless of the set of hypotheses. 

If we know that x can be inferred from yi and 2/2 , using a deduction step 
({2/1, 2/2}, a:), then it should also be possible to infer x from a superset of 
the antecedents, e.g., 2/1, 2/2? and 2/3- There is no guarantee, however, that 
if ({2/1? 2/2}, a:) E D then also ({2/i,2/2,2/3}^ ^ To this end we define an 
inference relation h, that is the closure of D under addition of antecedents 
to an inference. 

Definition 3.15. {inference relation h) 

Let P = (X, H, D), be a deduction system. The relation h C p{H U X) x X 
is defined by 

Yhxif (y',x) E D for some Y' CY, □ 

It has some practical advantages to allow an infinite set of antecedents of h. 
If, for example, some x can be directly inferred from some hypotheses, we 
may write H h x, even though H can be an infinite set. 

When an entity can be deduced from a given set of entities by a series of 
inferences, we will use the notation h* (to be introduced in Definition 3.17). 
The symbol h is reserved for a single-step inference. 

Each deduction step is a valid inference, by definition it holds that D C\~. 
We will use the inference symbol h also to define (sets of) deduction steps. 
When we write F h x it is usually not relevant whether F h x is an element 
of D or F h X is obtained from some (F',x) E D with F' C F. The set D 
can be considered as a defining subset of h. We make the difference between 
D and h only because it is much easier for the specification of a deduction 
system to define the “essential” subset D rather than the full set of inferences 
h. In the rare cases where it is essential for some argument that a deduction 
step is in D, and not any derived inference, we will denote the deduction step 
by (F, x) rather than the the informal notation F h x. 

As a second, informal simplification of the notation, we write 2/1 5 • • • ? 2/fc ^ 
X, rather than {2/1, • • • , 2/fc} ^ ^ lo indicate that the consequent x can be 
deduced from the antecedents In most deduction systems there 

is a clear distinction between entities and sets of entities and no confusion 
can arise when the curly brackets are deleted. Only if a set of entities can 
be an entity by itself, e.g., y — {2/1, • • • ,2/fc}^ fhe informal notation y x is 
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ambiguous and cannot be used. In any such case where confusion could arise, 
it will be stated explicitly that we switch to the formal notation where set 
brackets cannot be deleted. 

Definition 3.16. {deduction sequences) 

We write X'^ for the set of non-empty, finite sequences x\^. . . with j > I 
and Xi e X (1 < ^ < j). Let P = {X, H, D), be a deduction system. 

An inference sequence or a deduction sequence in D is a pair {Y;xi, . . . ,xj) € 
{H U X) X X~^, such that 

Y U {xi, . . . ,Xi_i} \- Xi for 1 < z < j. 

As a practical informal notation we write 

Y X\ Xj 

for a deduction sequence {Y;xi, . . . ,Xj). 

The set of deduction sequences Z\(P) C p{H U X) 

When it is clear from the context which deduction 
A rather than zl(P). 

Definition 3.17. {transitive and reflexive inference relation h*) 

Let P = {X, H, D) be a deduction system. 

We define the relations and on p{H U AT) x X as follows. 

y 1-0 X if X G y , 
yh+x ifyh...hx, 

y h* X if y 1-0 X or y h+ X. □ 

We do not make a distinction between semantic validity (usually denoted 
1= x) and syntactic provability (i.e. H h* x). We are only concerned with 
syntactic structure here, the concept of semantic validity simply doesn’t exist 
in this context. A notion of correctness of a deduction system for a given 
specific purpose will be introduced in Section 3.4. 

Definition 3.18. {validity) 

Let P = {X,H,D), be a deduction system. 

The set of valid entities, denoted V(P), is defined by 

V(P) = {xeX\Hh*x}. 

We usually write V, rather than V(P), if it is clear from the context which 
deduction system is meant. □ 

Example 3.19. {propositional logic) 

A logical deduction system is a deduction system ( Wff, Ax, D) in which WfJ 
is a set of well-formed formulae. Ax C Wff is a set of axioms, and D a set 
that contains all instantiations of all proof rules. Standard propositional logic 
(see, e.g., Mendelsohn [1964]) is cast into a deduction system as follows. 



X AT + for P is defined by 

yhxih...hx,}. 

system is meant, we write 
□ 
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Wff is the smallest set satisfying 

• some given set of proposition symbols is a subset of Wff; 

• if € Wff then also -i0 E Wff; 

• if 0, t/; G Wff then also G Wff. 

A set of axioms for propositional logic is: 

Axi = \4>,tp e Wff}, 

Ax2 = 1 e W}> 

Axz = {{^ip-^^4>)^{{-^4>-^'4))-^(p) \ <j),ip € Wff}, 

Ax = AxiUAx 2 UAxs. 

The set of deduction steps, finally, is given by 

D = \<p,i^ewff}. 

The set of deduction steps J9 is a relation over Wff ^ x Wff, and is known as 
the inference rule modus ponens. □ 

Example 3.20. (CYK) 

The CYK algorithm, named after Cocke, Younger and Kasami [Younger, 
1967], [Kasami, 1965], is defined for grammars in Chomsky Normal Form(cf. 
Definition 3.4). For a given grammar {N, E, P, S) in CMT and string ui . . . a„ 
a deduction system (X, H, D) for the CYK algorithm is given by 

X = {[A,i,j] \ 0<i<j A A G A^}; 

H = {[AJ - IJ] I A-^aj e P A 1 < j < n}; 

D = [C,j,k] h [A,i,k] \ A-^BC G P A 0 < 2 < A: < j}. 

Note that the set of CYK items and the set of deduction steps are infinite, 
as they are not bounded by the length of the string. This has been done on 
purpose, in the sequel we will take care to define deduction systems in such 
a way that only the hypotheses depend on the particular string, while the 
sets of entities and deduction steps are fixed for a given grammar, hence X 
and D have to be able to cope with strings of arbitrary length. The fact that 
there is an infinite number of entities and deduction steps does not cause any 
practical problems; for parsing any given string, only a finite subset needs to 
be used. 

The set of derivable CYK-items is characterized by 
V(P) = {[A, 2, j] I . . . Uj }, 

this is easily verified by induction on the length of a derivation sequence . 

□ 
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3.4 Enhanced deduction systems 

In Section 3.5 and Chapter 4 we will describe (abstractions of) parsing and 
recognition algorithms by means of deduction systems. An important prop- 
erty of algorithms is correctness. For a deduction system, similarly, we need 
to be able to state that it is correct for a specific given purpose. In order to 
formally capture this property we introduce enhanced deduction systems. 

We have no semantic interpretation of deduction systems, hence we must 
establish a purely syntactic criterion for correctness. To this end we postulate 
the existence of a set of final entities, which is a subset of X. These final 
entities are divided into correct and incorrect final entities. Which ones are 
correct is known by definition. (When a deduction system is used for some 
particular purpose, there will be some motivation behind the definition of 
correct final items, but from a formal point of view the definition is arbitrary.) 
A deduction system is correct if it all correct final items are valid and all 
incorrect final items are invalid. 

When entities are trees, for example, we can take as final entities those 
trees that constitute a parse for some sentence. The correct final items, then, 
should be all the parse trees for a given particular sentence. Incorrect final 
items are all those trees that are valid parse trees, but for other sentences 
than the one that is to be parsed. 

This is formalized as follows. 

Definition 3.21. {enhanced deduction systems) 

A enhanced deduction system E is a quintuple (AT, i7, F, C, D), with 

X a set of entities, 

H a finite set of hypotheses, 

F C AT a set of final entities, 

C C F a, set of correct final entities, 

D C pfin{H U AT) X AT a set of derivation steps. □ 

The set F represents the entities that we are really interested in; the other 
entities in X\(iJUF) are “intermediate” entities that may help to derive the 
validity of the correct final entities. It is not demanded that F, or even C 
be finite. (In a cyclic grammar, for example, most sentences have an infinite 
number of parse trees. An algorithm that enumerates all parses is correct, in 
a sense, even though it doesn’t finish.) 

The relations h and h* are as in Definition 3.15, the sets of valid entities 
V(E) as in Definition 3.18. 

Definition 3.22. {correctness) 

Let E = (X, i7, F, C, D) be an enhanced deduction system. 

E is sound if all valid final entities are correct, i.e., F D V(E) C C, 

E is complete if all correct final entities are valid, i.e., C C F n V(E), 

E is correct if E is sound and complete, i.e., C — F D V(E). □ 
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Example 3.23. {yes /no system) 

A yes/no system focuses on the question whether a particular single entity 
is correct or not. A yes/no system is an enhanced derivation system of the 
form 

where y e X is the entity of which the validity is to be decided upon. □ 

Example 3.24. 

Any deduction system D = (X^H,D) can be extended to a correct enhanced 
deduction system 

Example 3.25. {CYK, continued) 

Consider, again, the CYK deduction system (I, if, D) as in Example 3.20. 
How should we define the enhanced system? That depends on what we see as 
the result that should be computed by the CYK algorithm. If we see CYK as 
a recognizer sec, then we only want a yes/no answer to be delivered. Hence 
we can define 

F= {[5,0,n]}, 

and in order to prove the correctness of the system we have to show that 
C = F if ai . . . On is a valid sentence and C otherwise. 

On the other hand, if we see the CYK algorithm as a parsing algorithm (of 
the kind that does not deliver parse trees but a useful set of partial results), 
we are interested in the entire set of valid items. From this point of view the 
proper way to enhance the deduction system of Example 3.20 is to define 

F = I, 

C = {[.d, i,j] I j 4=>- flt+i • • • ®j}- 

In order to prove the correctness of CYK, according to this definition, we 
have to establish that V = C. D 

The main point in defining enhanced deduction systems is that we need 
a formal notion of correctness, that allows us to formally define what consti- 
tutes a correct parsing schema. Prom Example 3.25 it is clear that this can 
be done in different ways. From a formal point of view, the first approach 
is the right one. CYK is, strictly speaking, a recognition algorithm. Such an 
algorithm is correct, by definition, if it yields a single yes/no answer indicat- 
ing whether the string is correct or not. From a more practical perspective, 
however, we see CYK as a parser and adopt the second point of view. On 
top of a yes/no answer whether a string is a sentence, a set of valid items is 
recognized from which the parse trees can be constructed. It is this set V we 
are interested in, as the “output” of CYK, hence the second enhancement is 
the appropriate one. 
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The same considerations apply to any chart parsing algorithm. It is the 
final chart, the set V in our terminology, that we are interested in. The 
problem is that one cannot give a general definition of V. Which items are on 
the final chart of a parser depends, of course, on the way in which the parser 
tries to construct a parse tree. 

Hence we cannot formalize the second notion of correctness. As we are 
constructing a formal theory here, we will adopt the first notion and regard 
a chart parser as a recognition algorithm. When we come to describe parsing 
system reflecting real algorithms (that is, algorithms described in the litera- 
ture as parsers, not as simple examples), however, we won’t even define the 
enhanced system but concentrate on the properties of the set V(ID)) instead. 



3.5 Tree-based parsing schemata 

A parsing system is a deduction system for a given grammar and string. A 
parsing schema is a more abstract object that defines a parsing system for 
arbitrary grammars and strings. 

First, we will consider a deduction system for a given grammar G — 
(A", T,P, 5) and a given string ai...an. The domain of such a deduc- 
tion system is a subset of Trees {G) (including the extensions with pseudo- 
productions, cf. Definition 3.10.(m)). The set of deduction steps D encodes 
how new trees can be obtained from trees that have been derived already. 
The initial set of trees is given by the set of hypotheses H. 

A system is complete if all marked parse trees for the string are deduced. 
A system is sound if no marked parse tree for a different string of the same 
length can be deduced. It is conceivable, however, that a marked parse tree for 
a string of shorter length is deduced by a sound tree-based parsing system. 
Consider, for example, the case that 5=?-*ai . . . and S=>*Sak-\-i . . . a„. Then 
a marked parse tree (5 ^ . . . a^) could be found while parsing oi . . . a^. 

Definition 3.26. {{instantiated) tree-based parsing system) 

Let G be a context-free grammar and oi . . . an G T* and arbitrary string. A 
deduction system (T, iT, D) is called an instantiated tree-based parsing system 
for G and ai . . . On when the following conditions are satisfied:^ 

{i) T C Trees{G), 

[ii) vfcT, 

{in) (a->aj € H for each ai, 1 < z < n □ 

Usually we will drop the adjective “instantiated” and talk about a tree-based 
parsing system for G and ai . . . On- 



(Cf. Definition 3.11 for V^q^). 



3 




54 



3. Tree-based parsing schemata 



Definition 3.27. {correct tree-based parsing system) 

An instantiated tree-based parsing system (T, H, D) for a grammar G and a 
string ai . . . a„ is correct if the enhanced deduction system 

{T,HMS\'PG{ai...an),D) 

is correct. □ 

The set of hypotheses H will be different for different input strings, ob- 
viously. It provides the initial trees from which everything else is derived. It 
would make sense (but it is not implied by the above definition) that the set 
of deduction steps D is not dependent on a particular input string. This will 
be a consequence of the next definition, in which we consider parsing systems 
for arbitrary strings. The fact that the domain of a parsing system should be 
independent of the string that is to be parsed has been anticipated in Def- 
initions 3.26 and 3 . 27 . It would suffice to demand that Vg{Q '1 • • -<^n) C T, 
rather than C T, so as to make sure that (T, i7, D) is a valid parsing 
system for ai . . . an- (The set of final entities would then be ^ fi T, which 
does not necessarily equal V^q^). Marked parse trees of strings that are not 
to be parsed need not necessarily be contained in the domain of the system. 
But, obviously, all marked parses of any string must be in T if it is to serve 
as a domain of a parsing system for arbitrary strings. 

Definition 3.28. {uninstantiated tree-based parsing system) 

Let G be a context-free grammar. An uninstantiated tree-based parsing system 
for G is a triple (T,IC,D) with with 1C : E'^-¥p{Trees{G)) a function such 
that (T, JC{ai ... On) , D) is a tree-based parsing system for each ai . . . a^ G 
E*. An uninstantiated tree-based parsing system (T, /C,D) is correct if (T, 
JC{ai . . . an),D) is correct for each ai . . . an G Z**. □ 

We will blur the distinction between instantiated and uninstantiated systems 
somewhat; as a practical notation we write T(ai ... an) or simply T to denote 
both. In an instantiated system, ai . . . an denotes a particular string and in 
an uninstantiated system ai . . . an denotes a formal parameter for a string. 
This won’t cause any confusion. 

Definition 3.29. {tree-based parsing schema) 

A a tree-based parsing schema T for a class of grammars is a function 
that assigns an uninstantiated tree-based parsing system to every grammar 

Gecg. 

T is correct if, for each G £CG^ the uninstantiated tree-based parsing system 
T(G) is correct. □ 

Definition 3.30. {the function 1C) 

In all examples of tree-based parsing systems and schemata we will use the 
the same function /C, defined by 

/C(ai...an) = I LI {(#~>#q), ($— □ 
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The end-of-sentence marker $ and beginning-of-sentence marker # are added 
for convenience. In some parsing schemata it is rather more easy to define 
D(G) when every word in the sentence has a left and right neighbour, also 
the first and the last word. We will not use these sentence markers until 
Chapter 6, however. Only when we discuss filtering we will see practical 
examples of their use. 

It is always possible to convert a parsing system using the beginning- 
of-sentence marker into a system without it. One simply has to adapt the 
deduction steps for the special case that one of its arguments would extend 
to the left of position 0. The situation at the end of the sentence is different, 
however. It might be essential to know that a tree of a certain kind (c.q. 
the next word of the string) does not exist. As negative information can’t be 
handled in our formalism, the nonexistence of a (n-fl)-st word is stated in a 
positive way by the end-of-sentence marker. 

As we always use the same function /C, a parsing schema T is fully spec- 
ified by a defining a pair {T{G),D{G)) for an arbitrary grammar G G CQ. 
For each grammar G E CQ and string ai . . . an G E* the schema materializes 
to a deduction system 

T(G)(ai...an) = (r(G), /C(ai . . . an), T^(G)). 



It is possible, although somewhat cumbersome, to give a characterization 
of a universal class of parsing schemata. Let (X-^Y) denote the class of 
functions from X to Y. Let CQ be a class of context-free grammars, Sym 
be a universal set of symbols from which N, E, and E_ are drawn, U the 
universal set of labelled trees. Then T C U, or T E p{U). Furthermore, 
/C G {Syrn -^p{U)) and D G p{pfin{U) x U). Hence the universal class of 
parse schemata is a subclass of 

{CQ {p{U) X {Sym*^p{U)) x p{pjin{U) x U))) 

One could add constraints to this huge class of objects such that that only 
“meaningful” elements remain, but that is not very interesting. The more 
important fact is that we have a formally defined a “universe of parsing 
schemata” , in which we can reason about schemata, define relations between 
them and invent substitutions that transform a parsing schema into a differ- 
ent parsing schema. 

Example 3.31. (PS, a schema for the Primordial Soup algorithm) 

A Primordial Soup parsing schema PS is defined as follows. For an arbitrary 
G G C!FQ and ai . . .a„ G we define a tree-based parsing system Tps = 
(T, /C(ai . . . a„), D) with /C(ai . . . a-n) as in Definition 3.30. We make use of a 
predicate allowed that is true for a tree if both the word order constraint and 
the width constraint are obeyed. Different definitions for these constraints 
are possible; which one is chosen doesn’t matter for the general idea of the 
Primordial Soup schema (cf. Definition 2.1): 
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T = [r E Trees (G) \ allowed (r)}; 

Z)(i) = { h {A-^a) I .4->a e F}, 

jD(2) = {(j, r h a <] r I a < r G T}, 

D = UF(2). 

Unlike the intuitive version of the Primordial Soup algorithm in Chapter 2, 
we make a distinction here between elementary trees that constitute marked 
terminals (the hypotheses) and elementary trees that represent the produc- 
tions of the grammar. These last ones are included in the deduction steps, 
but do not need an antecedent. Productions are always valid, irrespective of 
the sentence. The second set of deduction steps denotes all possible instances 
of valid composition. 

The description of production trees as (antecedentless) derivation steps, 
rather than hypotheses, is a consequence of the general principle that the 
sentence is coded by /C, while the grammar is covered by D. □ 

Example 3.32. (TCYK, a tree-based parsing schema for CYK) 

A tree-based parsing schema TCYK can be given for CXT, the class of 
grammars in Chomsky Normal Form. For an arbitrary grammar G E CJ\fT 
we define Tqyk = (T, /C(ai . . . an),D) by 

r = {r E Trees[G) \ yield{r) E T*}; 

P)i^) z=. -((d — yaf) {A—y [a — yaf)) | A — ya ^ F} 

^ {(B (C aj+i ...a*.) 

h {A^ai^,...a,)\A-^BC eP}. 

D = 

The first set of deduction steps is to derive the nonterminals we usually 
start with, because for the sake of standardization the hypotheses cover the 
terminals in the sentence. □ 



3.6 Conclusion 

We have given a formal definition of tree-based parsing schemata. A deduction 
system is defined for a particular grammar and string. The hypotheses, i.e., 
the initial set of valid objects, are determined by the string. The grammar 
is encoded in the deduction steps of the system. Hence an uninstantiated 
parsing system for some grammar can be instantiated to a deduction system 
by providing the hypotheses for that string. A parsing schema specifies an 
uninstantiated parsing system for some class of grammars. 
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Parsing schemata are concise, because many practical details are ab- 
stracted from. Moreover, the description is static. We have objects and rules, 
but no behaviour of any kind. This will prove to be an asset in reasoning 
about systems in the next chapters. Static objects are much easier to capture 
formally than dynamic behaviour. 

Practical parsers compute items, rather then trees, as partial results. In 
the next chapter we will generalize the notion of tree-based parsing schemata 
to item-based schemata. One could interpret tree-based schemata as a special 
kind of item-based schemata, where every item comprises a single tree. 
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Many parsing algorithms, like CYK and Earley, are in fact recognition algo- 
rithms. Such an algorithm does not construct parse trees by glueing together 
partial parse trees as in a tree-based parsing system. Only the existence of 
a parse tree is derived, based on items that denote the existence of partial 
parse trees. In the CYK algorithm, as described in Example 3.20, an item 
[A^i^j] is valid iff ...aj. That is, in the notation of Section 3.1, 

some tree {A ^ a-_^i . . .g,j) exists, but we don’t care about the structure of 
the particular tree. It might be the case that several different trees exist with 
root A and yield a^_^i . . .a^-. 

Taking a more abstract view, we can see an item [A,i,j] as the equivalence 
class of all trees with root A and yield In a more general, formal 

approach we will define items as equivalence classes of trees. 

From a set of recognized items a parse can be constructed in several ways. 
In the CYK case, for example, once the item [5, 0,n] has been recognized 
one could start to build a parse tree in “top-down” fashion, retracing the 
recognition steps in reverse order. For each computed [A,i, A:] with k — i > 1 
it is guaranteed that some production A-^BC and position j can be found 
such that and [C,j,k] have been computed also. 

Alternatively, one could annotate the items computed by the recognition 
algorithm with information how they were obtained. That is, when a deriva- 
tion step [B^ 2 , j], [C, j. A;] h [A, f. A:] is successfully applied, we add to the item 
[A, 2 , A:] the information that it is obtained from symbols B^C and position 
j. Thus all information that is needed to construct all parses is captured in 
the set of computed items in a distributed way. 

Finally, if we do not limit parsing to a context-free backbone but add 
semantic expressions to constituents, we might not need a parse tree anyway. 
The desired result in such a grammar is the semantic expression (or the set 
of different semantic expressions) that is added to the item [5, 0,n]. In such 
an approach, the structure of the parse tree(s) is irrelevant. 

From now on we will focus on parsing algorithms that do not really con- 
struct parse trees. It suffices that a parser produces a set of valid items. 
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We have argued that an item can be seen as an equivalence class of trees. 
But trees are grouped together into an item not just randomly, but because 
they share some relevant properties. Sets of items are congruence classes 
rather than equivalence classes. An item-based parsing schema can be seen 
as a quotient system of a tree-based parsing schema and a congruence relation. 

The notions quotient and congruence are introduced for arbitrary deduc- 
tion systems in 4.1. In 4.2 we apply this to enhanced deduction systems, 
incorporating a notion of validity. Quotient-based parsing schemata are de- 
fined in Section 4.3. 

This rather algebraic approach will provide us with an understanding of 
what an item is. Such a fundamental understanding is necessary, because 
many different algorithms employ many different kinds of items. In this the- 
oretical setting we can see these many different kinds of items as convenient 
notations for particular subtypes of items from a more universal type. 

Having dealt with the underlying algebra, we will simplify matters a lot. 
In 4.4 we define parsing schemata based on items in much the same way 
as tree-based parsing schemata were introduced in Section 3.5. Items can be 
interpreted as partial specifications of trees, rather than congruence classes of 
trees. This more liberal view makes it possible to include inconsistent items, 
i.e., partial specifications that are not matched by any well-formed tree. In 
Section 4.5 we will clarify the relation between the two definitions of parsing 
schemata and argue that inconsistent items, although incompatible with the 
theory, do not do any harm in practice. 

In 4.6, finally, we will give some nontrivial examples of parsing schemata 
for well-known parsing algorithms: the Earley algorithm (with and without 
top-down prediction) and the Left-Corner algorithm. 

A remark on notation: in this chapter equivalence classes are sometimes 
regarded as sets and other times regarded as entities. In order to avoid any 
possible confusion, we will use the informal notation 2/i , . . . , 2/fc b ^ only in 
cases where it is abundantly clear that t/i » • • • ? 2/fc sire entities rather than sets 
of entities, viz., in examples of parsing schemata for well-known algorithms. 
When we discuss parsing systems on a more abstract level, we only use the 
formally unambiguous notation {yi, . . . ,yk} x or Y \- x. 



4.1 Quotient deduction systems 

In this section we are concerned with equivalence relations on an arbitrary 
deduction system D. We will start to establish some desirable properties of 
equivalence relations. Next, we introduce the notion of a congruence relation 
(denoted and show that congruence relations satisfy these properties. In 
Section 4.2, subsequently, we will investigate properties of congruence rela- 
tions on an enhanced deduction system E. 
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An equivalence relation is transitive, reflexive and antisymmetric. Fur- 
thermore, an equivalence relation partitions a set into equivalence classes. 
We assume that the reader is familiar with these basic facts from algebra. 

Let ^ be an equivalence relation on a set X. We write [x]^ or simply [x] 
for the equivalence class of x, i.e., the subset of X containing all x’ such that 
x' ^ X. A quotient deduction system is the result of contracting equivalence 
classes to single entities. 

Definition 4.1. {quotient deduction system) 

Let P = (X^H,D) be a deduction system, ~ an equivalence relation on X. 
Then we define the quotient system P/~ = {X/ H/ D / by 

X/ ^ = {[x] I X e X], with [x] — {x' ^ X \ x' x] for any x E X, 

H/r^ = {[h] \heH}, with [/i] = {^} for h e H\X, 

= {({[2/1], •• •,[?/*]}, N) I i{yi,---,yk},x) e D}. 

It is left to the reader to verify that P/~ is indeed a deduction system. We 
also call ~ an equivalence relation on P, rather than an equivalence relation 
on X. □ 

An inference relation on a quotient system is defined as the closure of 
the set of deduction steps D/ ^ under addition of antecedents (cf. Defini- 
tion 3.15). The transitive quotient inference relation is derived from h"" 
by Definition 3.17. 

On the other hand, we have a transitive inference relation h* in the de- 
duction system P, and when P is contracted to P/~, we obtain a quotient 
transitive inference relation h*"", defined by 

{[j/i],---,[2/fc]} M if {yi, --,yk} I-* x). 

What, then, is the relation between the quotient transitive inference rela- 
tion h*"" and the transitive quotient inference relation on p{H U X)/^ 
xX/~? It follows from Definition 4.1 that^ 

C h'-*, (4.1) 

but the reverse is not necessarily true. 

In similar fashion we can compare the equivalence classes of valid entities 
with the valid equivalence classes in the quotient system. It trivially holds 
that 



V(P)/- CV(P/-). (4.2) 

For deduction sequences, similarly, we find 

Z\(P)/- C4(P/-). (4.3) 

Let b 2^1 )r Zj h X. Then, with induction on this deduction 

sequence, {[yi], . . . , [yjt]} b [^i] h . . . h [zj] h [x]. 



1 




62 



4. Item-based parsing schemata 



In Theorem 4.6 we will establish sufficient conditions that guarantee equality, 
rather than set inclusion, in (4.1)-(4.3). But in order to discuss these matters, 
we will first introduce some terminology. 

Definition 4.2. {conservation properties) 

An equivalence relation ~ on a deduction system P is called validity conserv- 
ing if 

V(P)/~ = V(D/-). 

An equivalence relation ~ on a deduction system P is called inference con- 
serving if 

An equivalence relation ^ on a deduction system P is called deduction se- 
quence conserving if 

zA(P)/- = Zi(P/-). □ 

Corollary 4.3. 

Let ~ be an equivalence relation on some deduction system. 

If ~ is inference conserving then ~ is validity conserving. 

If ~ is deduction sequence conserving, then ~ is inference conserving. □ 

Why are we interested in all these properties? The main issue, of course, 
is validity conservation. When we discuss quotients of enhanced deduction 
systems in Section 4.2, we will establish conditions on equivalence relations 
that guarantee that a quotient system of a correct system is correct. The 
stronger notion of deduction sequence conservation is needed for a technical 
result in Chapter 5.^ The intermediate property of inference conservation is 
merely useful to simplify notation. When it is known that h*"" = we can 
write h rather than and h* rather than for inferences in the quotient 
system. 

Example 4.4. 

The hierarchy of equivalence relations that is implied by Corollary 4.3 is 
strict. We will give examples of deduction systems that satisfy one property 
but do not satisfy the next stronger property. 

• Let P = (X, if, D) be a deduction system with A = {ai,a 2 ,b},H = {/i}, 
and 



D = {{h}\-au {a 2 }\-b}. 

^ It is essential that the definitions of item contraction and item refinement are 
based on deduction sequence conservation, rather than validity conservation, in 
order to guarantee that refinement, as defined in Section 5.2, is a transitive 
relation. In Section 5.1 we will see that quotients over congruence relations are 
item contractions. We can make this follow as a corollary if we establish here 
that congruence relations are deduction sequence conserving. 
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Moreover, let ~ be the equivalence relation on ID) defined by a\ a2 and 
a: ^ re for any x e X. Then it holds that b e V(P/'^), but also b ^ V(P)/ 
Hence ^ is an equivalence relation that is not validity conserving. 

• Let P = {X,H,D) be a deduction system with X — {a, 61,62, c}, H = 
{h}, and 

D = {{h}\-a, {a}h6i, {/i} I- 62, {62} h c}. 

Moreover, let ^ be an equivalence relation on P defined by 61 ~ 62 and 
X ~ X for any x £ X. Then, clearly, V(P/ ~) = V(P)/ But for the 
inference 

[a] [c] 

in P/ there is no corresponding inference in P. Hence ~ is a validity 
conserving equivalence relation that is not inference conserving. 

• Let P {X,H,D) be a deduction system with X = {ai,a2,b,c], H = 
{/i}, and 

D = { {6.} h ai, {/i} b 6, {02} h c, {6} h c}. 

Moreover, let ~ be an equivalence relation on P defined by oi a2 and 
X ~ X for any x £ X. Then it is easily verified that = h""*. But for 
the deduction sequence 

{[h]} f-~ [ai] [c] 

in P/~ there is no corresponding deduction sequence in P. Hence ^ is an 
inference conserving equivalence relation that is not deduction sequence 
conserving. □ 



Next, we turn to the notion of congruence. Congruence is defined with 
respect to functions over a domain. An equivalence relation 2:^ is a con- 
gruence relation with respect to a function / : X^-^X if for arbitrary 
Xi x[, ... ,Xk ^ it holds that /(xi, . . . ,Xk) — Standard 

handbooks on algebra (as, e.g., [Gratzer, 1979 ]), do not extend congruence 
to relations over a domain. So we will do that first. 

Let us, for the sake of simplicity, look at a binary relation R. We call 2:^ 
a congruence relation with respect to a relation R if the following condition 
is satisfied: 

if x' X and xRy then there is some y' — y such that x'Ry' . ( 4 . 4 ) 

If we apply this to a function, which is a particular kind of relation, then 
( 4 . 4 ) reads 

if x' 2 ^ X and y — /(x) then there is some y' c:^y such that y' = J{x') 

which corresponds to the standard notion of congruence. We can see as a 
nondeterministic function. The same idea can be applied to set of deduction 
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steps, where h can be seen as a nondeterministic function with a variable 
number of arguments, (we will swap x and y, however, as we have mostly 
used y to denote arguments and x to denote consequents in deduction steps) . 
If we see h as an action, then the notion of congruence on deduction systems 
corresponds to the notion of simulation in process algebra. 

Definition 4.5. {congruence relation on a deduction system) 

Let ID) = (X, H, D) be a deduction system. An equivalence relation cn; is called 
a congruence relation on ID) if, for any 2/i , • . . , 2/fc , t/i , . . . , t/j[. G pfin{H U X) and 
X G X the following condition holds: 

if ^ X and 2/1 ~ 2/-, • • . ,2/fc ^ 2/^ 

then there is some x' G X such that x' ~ x and {yj, • • - , 2 /^} L x'. □ 

Theorem 4.6. {congruence relations are deduction sequence conserving) 
Let Cir: be a congruence relation on a deduction system ID) then 

A{B)/c^ = A{B/c^). 

Proof. We only have to prove A(D/ c^) C Z\(P)/:^. 

Without loss of generality, we only consider deductions with a finite set of 
antecedents. Hence it suffices to prove the following claim. 

Claim: Let 

{[yi], • • • , [ 2 /*]} l-~ ki] h- . . . h- [Xj] (4.5) 

for some yi,. . . ,yk £ H L) X and x\,. . . ,xj € X . 

Then there are y[ £ [ 2 / 1 ], ■■■, y^^ [yk], x[ £ [xi], . . . , x'- £ [xj] such that 
{yi,---iy'k}^ x[[- ...\- x'y (4.6) 

We prove this claim with induction on j. 

The basic step j = 1 follows straight from the definition off--. 

Next, assume that the claim holds for for 1, . . . , j — 1, and assume (4.5) for 
some yi , . . . , 2/fc, xi , . . . , Xj . Prom (4.5) it follows that 

{[yi]? • • • ) [vk]') [^i]? • • • ? [^j— i]} [^j]^ 

hence, by the definition of h-, there are y” G [yi], • • • , y^ G [yk], x^{ G [xi], 

. . . , x^ G [xj] such that 

{ 2 / 1 , • • • , 2 /fc , a:" , . . . , x"_i } t- x'^ (4.7) 

Furthermore, according the induction hypothesis, there are y[ G [yi], ..., 
y'k € [2/fc], £ [xi], . . . , x'_j £ [xj_i] such that 

{yi)---,2/!fe} I" ar'i h ... I- x'_i. (4.8) 

From (4.7) and y[ ~ yj', . . . , yj;. ~ y'^, x[ ~ x'{, ..., x'_i ~ x"_j, the 
congruence property yields x'- G [xj] such that 

{yi 5 • • • 5 y^ > J * • • ’ ^j—i } ^ ^ j ’ 

and (4.6) is obtained as a combination of (4.8) and (4.9). 



(4.9) 

□ 
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4.2 Quotients of enhanced deduction systems 

In Section 4.2 we have defined enhanced deduction systems so as to introduce 
a notion of syntactic correctness. Assume that an enhanced deduction system 
E is correct, and is a congruence relation on E. Does this imply that E/ 
is also correct? We will show that this is not generally the case and establish 
a sufficient condition. 

First, we extend Definition 4.1 to enhanced deduction systems in the 
obvious way. 

Definition 4.7. {enhanced quotient deduction system) 

Let E = (A, H, F, C, D) be an enhanced deduction system, ~ an equivalence 
relation on X. Then we define the quotient system E/ (^/ H/ 

F/ I Dj by Xj H/ D/~ as in Definition 4.1 and 

F/- = {W \xeF}, 

Cl- ={N|:ceC}, 

It is left to the reader to verify that E/ is indeed an enhanced deduction 
system. □ 

Definition 4.8. {correctness preservation of equivalence relations) 

Let E = (X, F, F, (7, D) be an enhanced deduction system and ~ an equiva- 
lence relation on E. 

~ is called soundness preserving if 

for each [x] G V(E/ ^) H F/ ^ there is some x' G [x] H F such that 
x' G V(E). 

^ is called completeness preserving if 

for each x G V(E) D F it holds that [x] G V(E/~). 

^ is called correctness preserving if it is both soundness and completeness 
preserving. □ 

Note that every equivalence relation is completeness preserving by definition. 

Corollary 4.9. 

If E is a correct enhanced deduction system, and ~ is a soundness preserving 
equivalence relation on E, then E/ ~ is also a correct enhanced deduction 
system. 

Example 4.10. {congruence does not preserve soundness) 

We define an enhanced deduction system E by 

X = {^1, 025 ^>2, c}, 

H = {M, 
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F = {62, c}, 

C = {c}, 

D = {{/i} I- Oi,{ai} h 6i,{ai} h c, {02} I- 62,{a2} I- c, }. 

Note that E is correct, because c G V(E) and 62 0 V(E). Furthermore we 
define a relation 2:^ on E by ai ~ U2, 61 ~ 62 sind x x for any x £ X, It is 
easy to verify that is a congruence relation. 

Now we find [62] G F/ and [62] 0 C/ but nevertheless [62] G V(E/ ~). 
Hence E/ 2:^ is not sound. □ 

In enhanced deduction systems we make a distinction between final en- 
tities in F and “intermediate” entities in X\F. The anomaly in the above 
example is caused by the fact that the congruence class [bi] contains entities 
of both types. If congruence classes are subclasses either of X\F or of F, the 
problem cannot occur. 

Definition 4.11. {regular equivalence relation) 

Let E = (X, H, F, C, D) be an enhanced deduction system. An equivalence 
relation ~ on E is called regular if, for all x G F and x ~ x' it holds that 
x' G F. □ 

We will in fact only be concerned with regular congruence relations, rather 
than arbitrary regular equivalence relations. We write =, rather than 2^, as 
a standard notation for regular congruence relations. 

Theorem 4.12. 

A regular congruence relation on an enhanced deduction system is correctness 
preserving. 

Proof. Let E = (X,H,F,C,D) be an enhanced deduction system and = a 
regular congruence relation. Assume that = does not preserve soundness. 
Then there is some [x] G V(E/ =) D F/ = such that all x' G [x] U F are not 
valid in E. Then either [x] contains some valid x' outside F, in which case = is 
not regular, or [x] G V(E/ =)\(V(E)/=), in which case = is not a congruence 
relation. □ 



4.3 Quotient parsing schemata 

After all the algebraic preparation we can now apply the results to parsing 
systems and parsing schemata. 

Definition 4.13. {{un) instantiated quotient parsing system) 

An (un) instantiated quotient parsing system is a deduction system Q = 
T/~ with T an (un) instantiated tree-based parsing system and = a regular 
congruence relation on T. 

T is called the underlying tree-based parsing system of Q. □ 
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Definition 4.14. {quotient parsing schema) 

A a quotient parsing schema Q for a class of grammars is a function that 
assigns an uninstantiated quotient parsing system to every grammar G E CQ. 

□ 

In Definition 3.8 we have introduced a practical notation for trees. This 
can be extended to a practical notation for congruence classes of trees, as 
follows. 

When we referred to a tree (A ^ a), we meant some particular tree. Note, 
however, that the tree is underspecified. There could be many trees with root 
A and yield a. Typically, a congruence class comprises all trees that suit this 
partial specification. In the sequel, we write [A a] for the congruence class 
[{A ^ a)] denoting all trees with root A and yield a. Or, more generally, for 
any tree specification according to Definition 3.8, we denote the set of trees 
satisfying the partial specification, rather than some arbitrary tree within 
that set, by replacing the outermost angle brackets by square brackets. Hence 

[A'^ a {B (3) 7 ] 

denotes the set of all trees that conform to the picture in Figure 3.1(c), and 

7] 

the set of all trees that conform to the picture in Figure 3.1(d) on page 43. 

In Section 3.5 we have defined a function /C that assigns hypotheses to 
any input string (cf. Definition 3.30). As hypotheses are not contracted in 
a quotient system - or, to be very precise, each hypothesis is replaced by a 
singleton set - we find that 

JC(ai . . . On)/— = {[a-^CLi] I a = Oi} U {[#— ^#q], (^-lO) 

for any regular congruence relation =. 

Hence a parsing schema Q is fully specified by a triple {T{G), D{G), =g) 
for an arbitrary grammar G E CQ. For each grammar G E CQ and string 
tti . . . Un € the schema materializes to a deduction system 

Q(G)(ai...an) - {T{G)Mai ^ ^ ^an)^D{G)) / 

The tree-based parsing system T specified by T{G) and D{G) is called the 
underlying tree-based parsing schema of Q. 

Corollary 4.15. A quotient parsing system Q is sound/complete/correct if 
and only if the underlying parsing system T is sound/complete/correct. 

A quotient parsing schema Q is sound/complete/correct if and only if the 
underlying parsing schema T is sound/complete/correct. □ 

Example 4.16. (QCYK, a quotient parsing schema for CYK) 

A quotient parsing schema QCYK can be given for CUT, the class of gram- 
mars in Chomsky Normal Form. 

For any grammar G E CMT we define the relation = on Tcyk by 
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{A ^ •*.%) — (5 - ‘hi) if A = BJ = k, and j = 1. 

The fact that = is a congruence relation can be established straightforwardly 
(and we will not take the trouble to write it out in a formal manner). Let 
T = {A . . .Gj) and cr = r. If r is used in a deduction step, then the 

internal structure of the tree is irrelevant. The tree a has the same root A 
and positions i and j can be used in exactly the same manner to deduce large 
trees that are congruent to trees deduced by r. 

For an arbitrary grammar G € CAfT we define Qcyk = IC{ai . . . an),D) 
by 



I 


= {[i4 .. 


• Oj] \AeN /\ Sf+i 


.% ep*}; 


£)(1) 


= 1“ [A 


(a— »^Oj)] 1 A-¥a € P}, 




£)(2) 


= {[-B . 


fij+i ...aj 








[A Oj^.1 


...a^WA-^BC eP}, 


D 









and /C(ai . . . an) as in (4.10). 

When we apply the usual denotation [A,i,j] for an item [A ^ • • • %] we 

get the the following simplified description of Qcyic* 

X' = {[A,iJ] \ A £ N A 0 <i < j A 3tG Trees(G) : 

root{r) — A /\ yield{r) £ £* A \yield{r)\ = j - i}] 

/}(!') = {[a->aJh[A,2-l,i]|>l-4aGP}, 

Z)(2') = {[B,i,j],[C,j,k] h [A,i,k]\A^BC eP}, 

D' = P(i') 

There is but one difference with the conventional description of the CYK 
schema: only those items [A, i, j] are in the domain for which there is at least 
one tree r £ [A,z,j]. It could happen, for example, that A only produces 
strings of even length. In that case, items [A, i, j] with odd values of j — i must 
be excluded from the domain. An empty congruence class is a contradiction 
in terms and violates the underlying mathematical theory. In Section 4.5 we 
will see how to deal with this problem from a practical point of view. □ 



We have now clarified the ontological status of an item: a congruence 
class of trees in a deduction system. This is not unimportant. One of the 
advantages of the formalism developed here is that any item-based parser can 
be described in it. Different algorithms use different items; it is impossible 
to predict which particular type of item is going to be used in a parsing 
algorithm that will be discovered next week. For that reason we need such 
an ontological understanding. Whatever new type of parsing items somebody 
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is going to introduce someday, it will capture those partial specifications of 
trees that matter for the deduction relation. That is, trees satisfying the same 
partial specification are congruent. 

When it comes to the use of items in the description of practical parsing 
schemata, we can simplify matters a lot. In the next section we will specify 
the domain of a parsing schema directly as a set of items, rather than a 
quotient of a domain of trees. 



4.4 Item-based parsing schemata 

Having stated that - in principle - item-based parsing schemata can be de- 
scribed as quotients of tree-based parsing schemata, we will now take a much 
more practical approach. We may interpret an item as a partial specification 
of a tree. If there is a set of trees that conforms to this partial specification, 
then this set comprises an equivalence class (or indeed a congruence class) in 
the domain of trees for the grammar. An anomaly that may occur, however, is 
that such a partial specification is inconsistent: there is not a single tree that 
satisfies the specification. Hence, such an item must be cissociated with an 
empty set of trees. A parsing system will be called regular if it is (equivalent 
to) a quotient system. The theory of Section 4.3 is only defined on regular 
subsystems. For practical applications the difference is a minor one: for all 
parsing schemata that we will deal with, one can argue that the introduction 
of inconsistent items does not affect the correctness of the schema. We will 
treat this problem at length in 4.5. 

We will now proceed to define item-based parsing schemata (in the sequel 
simply called parsing schemata) in much the same way as tree-based parsing 
schemata were introduced in Section 3.5. For the domain of a system we do 
not take a subset of Trees (G) but a subset of a partition of Trees (G) 

A partition II (X) C p{X) is a collection of pairwise disjunctive non- 
empty subsets of X such that every x e X is contained in some tt G TI{X). 
Every partition II defines an equivalence relation ~ 7 j by 

X ~/7 y if there is a tt G II{X) such that {x,y} C tt . 

And reversed, if ~ is an equivalence relation on X then is a partition 
of a:. 

Definition 4.17. {item set) 

Let Trees {G) be the set of trees for some context-free grammar G. A set 
X G p{Trees{G)) is called an item set if there is a partition 77 of Trees {G) 
such that 

JCi7(rree5(G))U{0}. □ 
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Definition 4.18. {types of items) 

Let X be an item set. 

• An item t G X is called empty if ^ = 0. 

• A non-empty item t G X is called completed if, for each r G r is a marked 
parse tree for some sentence. 

• A non-empty item l gX is called intermediate if, for each r G r is a not 
a marked parse tree for any sentence. 

• An item l gX is called mixed if there are a,r G i such that a is a marked 

parse tree and r is not a marked parse tree. □ 

Definition 4.19. {regular and semiregular item set) 

An item set X is called regular if it contains neither mixed items nor the 

empty item. 

An item set X is called semiregular if it does not contain mixed items. □ 



Definition 4.20. {final items) 

Let 77 be a partition of Trees {G) for some context-free grammar G, ai . . . an ^ 
U* a string. The set of final items Tq^jj for a string of length n is defined 
by^ 

^GM = {i € n {Trees (G)) | 3r € t : r G 
The set of correct final items Cqm for a string ai . . . On is defined by 

. . . Gn) = G n{Trees{G)) \ 3t G l\t G Vg{(Ii • • • On)}- □ 

The intention of Definition 4.20 should be clear. An item-based parser will 
be correct if all correct final items can be deduced from H and no other final 
items. 

After these preliminaries, the following definitions will not come as a 
surprise. 



Definition 4.21. {{instantiated) parsing system) 

Let G be a context-free grammar and ai . . . On G 77* an arbitrary string. A 
deduction system (X, 77, D) is called an instantiated parsing system for G and 
ai . . . Gn when the following conditions are satisfied: 



{i) X = X(G, 77) is an item set, 

(n) 

{Hi) [g— >>gJ G 77 for each ai, \ <i <n. 



□ 



Definition 4.22. {correct parsing system) 

An instantiated parsing system (X,H,D) for a grammar G and a string 
Gi . . . Gn is correct if the enhanced deduction system 

is correct. □ 



^ See Definition 3.11 for ^ and VG{ai ... an). 
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Definition 4.23. {uninstantiated parsing system) 

Let G be a context-free grammar. An uninstantiated parsing system for G is 
a triple (X,/C, D) where 1C : E* -^p{p{Trees{G))) is a function such that (X, 
JC{ai . . . an),D) is a parsing system for each a\ . , .an G E* . 

An uninstantiated parsing system (X, /C,D) is correct if (X, /C(ai ...an),D) 
is correct for each ai . ..On C E*. □ 

We will not make a clear distinction between instantiated and uninstantiated 
parsing systems and write P(ai ... an) or simply P to denote both. 

Definition 4.24. {parsing schema) 

A a parsing schema P for a class of grammars is a function that assigns 
an uninstantiated parsing system to every grammar G cCQ. 

P is correct if, for each G £ CQ, the uninstantiated parsing system P(G) is 
correct. □ 

Definition 4.25. {regular parsing schemata) 

A parsing system (X, H, D) is regular if X is a regular item set. 

A parsing schema P for a class of grammars CQ is regular if, for each G £ CQ 
and each ai . . . On £ E*^ the parsing system P(G)(ai . . . On) is regular. □ 

Definition 4.26. {the function 1C) 

In all examples of parsing systems and schemata we will use the the same 
function /C, defined by 

IC{ai . . . On) = {[O'-^Qii] I Cl = Oi} U {[#— 

As a more conventional notation for hypothesis items we will write [a,i — l,i] 
rather than [a->aj. The end-of-sentence marker is denoted by [$,n,n + 1], 
the beginning-of-sentence marker by [#, — 1,0]. □ 

As we have fixed the function /C, as usual, one only needs to specify X(G, IT) 
and D{G) for an arbitrary grammar G and a partition iI(G) of Trees {G) 
in order to give a full specification of a parsing schema. For each grammar 
G £ CQ and string ai . . . Un £ E* the schema materializes to a deduction 
system 

P(G)(ai...an) = (X(G,i7),/C(ai...an),D(G)). 



For the reader who really wants to know what kind of object a recognition 
schema is, in terms of set theory, it is remarked that the universal class of 
parsing schemata can be characterized as a sub-class of 

{Cg (p(p(W)) X X p{pfin{p{U)) x p{H)))) 

Again, the fact that the universal class of parsing schemata can be formally 
defined is rather more important than the particular structure of this type. 
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Example 4.27. {the CYK parsing schema) 

At last, we define an item-based parsing schema CYK by giving an item- 
based parsing system Pcr/c for arbitrary grammars G G CJ\f!F. 

Let [A,z, j] be an abbreviated notation for an item [A • • - aj]* Then 

^CYK is defined by 

I = {[A,iJ] \ A£ N A 0 < z < j}; 

^ {[a, z - 1, z] h [A, z -l,z] I A->aGP}, 

^ {[B,iJ],[CJ,k] h [A,i,k]\A^BCeP}, 

D = 

Note the difference with Example 4.16, where the domain contained only 
those items such that there is a tree that fits the item. Here we do allow 
items for which such a tree does not exist, hence CYK is not a regular 
parsing schema. But in the sequel we will show that CYK is 5 emi regular, 
which is good enough for all practical purposes. 

Thus we have obtained a CYK schema within the setting of a formal theory 
of parsing schemata that conforms to the intuitive CYK deduction system 
presented in Example 3.20. □ 



4.5 The relation between Sections 4.3 and 4.4 

A pain in the neck in the development of our theory so far is the problem of 
the empty item. We will now address this problem in some more detail and 
argue that it can be ignored for all practical purposes. 

Let A be a nonterminal that produces strings of even length. Then the 
item [A, 0, 3] - the set of trees (A ^ 0 , 10 , 20 ,^) for arbitrary 010203 - is empty. 
Many items can be empty, clearly. If, for example, B does not produce trees 
with a yield shorter than 4, the item [P, 0, 3] is empty as well. By definition, 
there is only one empty set. Hence, as items are sets, empty items must be 
identical. This seems counter-intuitive, to say the least, because the reasons 
for which [P,0,3] is invalid are quite different from the reasons for which 
[A, 0, 3] can’t be deduced. This problem can be handled in several ways. 

• The fundamental solution is to make a distinction between items and item 
descriptions. Such an approach is chosen in the formalization of feature 
structures (cf. [Kasper and Rounds, 1986], [Rounds and Kasper, 1986]), 
where a distinction is made between feature structures and feature de- 
scriptions. 

In this context this is not an attractive option, however, because it carries 
the obligation to formally define a rather more complicated item descrip- 
tion language, based on a notion of constraints on trees rather than sets 
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of trees. Moreover, using sets of trees as the fundamental notion is more 
general, because whatever constraint language is used to denote such con- 
straints, these implicitly define sets of trees. 

• The easy way out is not to allow the empty item in the domain. This is 
mathematically the most elegant option. To any definition of an item set X 
one could add that the parsing systems operates only on J\{0}. Moreover, 
if the empty item is not part of the domain, it can be shown that every 
item-based parsing schema is in fact a quotient of a tree-based parsing 
schema, and the theories of Sections 4.3 and 4.4 are equivalent. 

Prom a practical point of view, however, this option has the disadvantage 
that it is not clear a priori which items are empty. Moreover, a parsing 
schema for, e.g., the CYK algorithm would not be fully compatible with 
the canonical algorithm as found in the literature, where empty items are 
not excluded from the domain. 

• The last option, finally, is simply to live with the fact that there are dif- 
ferent denotations of a single empty item. This does not do any real harm, 
as long as it is guaranteed that the empty item is invalid, which seems a 
reasonable demand. When a parsing system is constructed by defining a 
regular congruence relation on a tree-based parsing system, it is logically 
impossible to arrive at a system that contains the empty item as an entity 
in the domain. Hence it surely can’t be deduced. 

This option is the most attractive, because it allows the most simple def- 
inition of parsing systems in a way that does not strain the compatibility 
with algorithm descriptions found in the literature. 

To our framework it adds the burden that we always have to show that the 
empty item is invalid. This is hardly a burden, however, as for any sensible 
parsing system this property comes about naturally.^ 

Thus we allow a liberal form of parsing system specification, which may 
contain different denotations of the empty item. Every deduction step in a 
parsing system that is actually going to be used for the construction of a 
parse will be contained in the regular subsystem. 

A more positive way of phrasing this design choice for our theory is the 
following. We acknowledge that there is a difference between items and item 
descriptions, but we do not prescribe a specific item description language. 
Any item description language that allows to define parsing schemata is ac- 
ceptable, because the theorems are based on the items themselves, rather 
than on item descriptions. 



^ If it is ensured that the empty item (in all its denotations) is invalid, then, 
obviously, introduction of the empty item does not affect the correctness of a 
system. This will be the case in all parsing schemata that are introduced in 
the sequel. It is not a necessary condition, however. One could envisage parsing 
systems in which some denotations of the empty item can be deduced under 
more specific conditions. 
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Definition 4.28. {semiregular parsing systems and schemata) 

A parsing system (X, H, D) is semiregular if 

(z) X is a semiregular item set 
(n) 0^V(P). 

A parsing schema P for a class of grammars is semiregular if, for each 
G E CQ and each ai...an E X**, the parsing system P(G)(ai . . . Un) is 
semiregular. □ 

Definition 4.29. {regular subsystems and schemata) 

Let P = (X, XT, D) be a semiregular parsing system. 

We define a regular subsystem P^ = (X^,Xf, D’') by 

r = X\0, 

= {{Y,x)eD\YcrAxer}. 

For a semiregular parsing schema P for a class of grammar CQ we define a 
regular subschema P^ by 

P^(G)(ai...an) = (P(G)(ai...an)r □ 

Corollary 4.30. 

A semiregular parsing system P is sound / complete / correct if and only if 
its regular subsystem P’’ is sound / complete / correct. 

A semiregular parsing schema P is sound / complete / correct if and only if 
its regular subschema P^ is sound / complete / correct. □ 

In Section 6.1 we will see that restricting a semi-regular parsing system to 
a fully regular parsing system is a special case of a more general operation 
called step deletion. 

Although it is obvious how to regularize a semi-regular system in theory, 
this might not be so obvious in practice. When one is confronted with a 
specification of an item set by means of constraints, it might be rather hard 
to establish whether a tree exists that satisfies those constraints. Hence, as we 
have extensively argued above, we settle for semi-regular parsing schemata. 
As long as the semi-regularity constraint is obeyed - which is typically a 
trivial property - we may safely ignore the empty item and its different 
denotations. 

We can now formally clarify the relation between the quotient parsing 
schemata of Section 4.3 and item-based parsing schemata of Section 4.4. We 
cannot describe, in general, semiregular parsing schemata as quotients, but 
we can do so with their regular subschemata. Everything outside a semireg- 
ular subschema has been added for convenience of description but is of no 
importance to the correctness of a schema. 
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Theorem 4.31. {regular parsing systems are quotient systems) 

Let P = {1, H, D) be a regular parsing system. Then there is a tree-based 
parsing system T = {Tp,H',Dp) and a regular congruence relation =p on 
T such that^ 

(Tp^H^Dp) /^P = {X,H,D). 

Moreover, F is correct if and only if (T p,H',Dp) / =p is correct. 

Proof. We define 

Tp = {re Trees{G) \ 3i : r e i), 

= {(a— >-ai) I [a-^Oi] G 
—P = {{a,r) E T X T \ 3i e I : cr, r G ^}, 

Dp = {<^ 1 , . . . b r G p/in(Tp U i/') X Tp I [cTi], . . . , [cTfc] h [r] G L^}. 

It follows straightforwardly that = is a regular congruence relation and that 
{Tp,H\Dp)/^P = {I,H,D). 

It is left to the reader to verify that correctness of F, (according to Defini- 
tions 4.22-4.24) and correctness of (Tp,H,Dp)/ =p (according to Defini- 
tions 3.27-3.29 and Corollary 4.15) are equivalent. □ 



4.6 Examples of parsing schemata 

After all the theory in the previous sections, we will now present a few ex- 
amples of nontrivial parsing schemata. We define schemata for the Earley 
parser (both the conventional one with top-down prediction and the bottom- 
up Earley parser introduced without prediction) and the Left-Corner parser. 

The reader is reminded that all parsing schemata have the same func- 
tion JC that assigns hypotheses to parsing systems (cf. Definition 4.26). We 
will slightly simplify the notation of the hypotheses. For an arbitrary string 
ai . . . On we define a set of hypotheses 

JC{ai...an) = - 1,^] I a = U -h 1], [#, -1,0]}. 

The beginning-of-sentence marker and end-of-sentence marker are in fact not 
needed here. In Chapter 6 some examples are given where these hypotheses 
are essential. 

Before we start describing the parsing schemata, a few more notational 
conventions are useful. Note that, by definition, D C pfin{HuX) x X. Thus, 
if we write (parts of) D in the format 

^ We anticipate the formal definition of isomorphism (=) that will be given in 
Definition 5.4. It should be obvious what is meant. 
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{yi • • • 2/fe 1“ a;} 

without any further conditions, this is to be interpreted as 

{yi-*. 2 /fchx I {yi,. . .yk} C H \J X Ax e X}. 

Furthermore, the sets Z and D in a parsing system will be subscripted with 
the name of the schema of which this system is an instantiation. A pars- 
ing system Earley(G)(ai ...an), for example, will be denoted as a triple 
{Z Earley, H , D Earley) ‘ The subscripts Can always be deleted if the name of the 
parsing schema is clear from the context. 

Often, Z and D are defined as a union of different subsets. These subsets 
are always indicated with superscripts. Superscripts may also be deleted if 
it is not relevant in some context which particular subset of X or D is being 
referred to. 

Example 4.32. {the Earley parsing schema) 

We will define a parsing schema Earley on CZ’Q by giving a parsing system 
for an arbitrary grammar G £ OF Q. 

We will first define a parsing schema using the conventional Earley items, 
and afterwards show that the set of Earley items for a particular grammar is 
a semiregular item set according to Definition 4.19. 

The parsing schema Earley is defined by specifying a parsing system P Earley 
for an arbitrary grammar G as follows: 

z Earley - {[A^a.p ,i, j] \ A~^ap ^ P A 0 <i<j}] 

= {f- [5^.7, 0,0]}, 

£)Scan _ {[A-^a.ap,i,j],[a,j,j + 1 ]\- [A^aa.p,i,j + 1 ]}, 
j^Compi _ ^[A^a.BP,i,j],[B-^^.,j,k]\-[A-^aB.l3,i,k]}, 
j)Pred _ {[A-^a.B0,i,j]'r [B-^.'YJJ]}, 

D Earley = £)Init £)Scan j^Compl jjPred 

jQScan^ £)Compi^ jjPred conform to the scoTi, Complete, and predict steps, 
respectively, of the Earley algorithm. 

A schematic illustration of the complete step is given in Figure 4.1. 
Deduction steps add the axioms that are needed to start the parser, in 
addition to the hypotheses derived from the sentence. 

The set of final items of T Earley (cf. Definition 4.20) and the subset of correct 
final items are: 

T = {[5-^7., 0,n]}, 

C ~ {[5-^7., 0,n] I 7=>* ai . . .an}. 

The set of valid items that is computed by the system is: 
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Fig. 4.1. The complete step 

Earley) — | > Cij A 

5=>*ai . . . aiA'y for some 7 }, 

conforming to the Earley parser as it is known from the literature. 

It is clear that a final item in T is valid if and only if it is a correct final 
item from C. Therefore, F Earley is correct for arbitrary G and ai . . . an and 
Earley is a correct parsing schema (for a formal proof, see [Sikkel, 1995, 
forthcoming]). 

An Earley item [A-^a»f3,i,j] is, in fact, a shorthand notation for the set of 
trees defined by 

[A-4 {a^ 

Special attention has to be paid to the case a = e. Such items have no 
marked leaf but can only be applied at a specific position in the sentence. 
Hence these are left-marked items, i.e., items containing left-marked trees (cf. 
Definition 3.12): 

[j ■ 

Having introduced the concept of left-marked items we could also denote 
them in the format of arbitrary Earley items 

which gives us a uniform notation for all Earley items. Thus the formal defi- 
nition of the item set for a particular grammar G is 

^Eariey{G) = {[A-^ {a ^ 0-^1 . . .u^)/?] | A-^a0 E P A 0 < z < j}; 

the operations can be defined accordingly. In order to establish the semireg- 
ularity of this set, we have to check that items are pairwise disjunct and that 
mixed items do not occur. All these properties follow straightforwardly from 
the definition. □ 

The parsing schema Earley is an abstraction not only of Earley’s algorithm. 
In Chapter 12 we will show that it is also the underlying parsing schema of 
an LR(0) parser. 
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Before we continue with the examples, one more point of friction between 
theory and practice has to be cleared up. As a persistent design choice, we 
have formulated parsing systems (X, H, D) in such a way that X and D de- 
pend on the grammar G but not on the string and H depends on the string 
ai . . . Gn but not on the grammar. A consequence of this design choice is that 
some parsing systems contain a (countably infinite) number of items with 
position markers beyond the length of the string that is to be parsed. In the 
bottom-up version of Earley, for example, any item j, j] is valid. The 

existence of this item is a consequence of the grammar, not a consequence of 
the sentence. Hence it can be derived by a deduction step without antecedents 
h 

Practically speaking, this is no problem at all. It is obvious that any 
sensible implementation would only consider recognizing those valid items 
that fall within the positions spanned by the sentence. But from a theoreti- 
cal perspective, however, this design is not elegant. Several solutions can be 
considered. 

The first option is to consider items of the form [A— ^‘.a, j, j] as hypotheses, 
rather than consequents of antecedentless deduction steps. Then H contains 
such items only with j < n, and the problem has been solved. A minor 
disadvantage of this approach is that definitions of parsing schemata would 
have to be based on instantiated, rather than uninstantiated parsing systems. 
Hypotheses now depend on G 2 is well as oi . . . Gn- A more substantial nuisance 
would be that we introduce a degree of freedom in the specification of parsing 
systems, allowing some kind of information to be coded either in H or in D. 
This leads to syntactically different, equivalent denotations of a single schema. 
As a consequence, we would have to define a normal form for the equivalence 
relation in order to compare (normal forms of) different parsing schemata. 

A second option is to replace antecedentless deduction steps h 
by deduction steps [$, n, n -f 1] b [A -4 .a, j, j] only for j < n. While this would 
be adequate for the examples here, the ad hoc character of such a definition 
causes problems in Chapter 6 where we discuss filtering. It would prohibit 
an elegant description of the Earley schema as a top-down filtered variant of 
the bottom-up Earley schema. 

As in previous cases we will choose a pragmatic solution, simply by argu- 
ing that the problem is not relevant for really existing parsers. Rather than 
the set of valid items V(P) we restrict our attention to the subset of relevant 
valid items V-"(P) for a sentence of length n. 

Definition 4.33. {relevant valid items) 

Let F = (Z, ZT, D) be a parsing system. An item E Z is irrelevant for (a 
string of length) n if every tree r £ i contains some marked terminal aj with 
jf > n or is a left- or right-marked tree (cf. Definition 3.12) marked with some 
j > n. We write X^'^ for the irrelevant items of Z. 

The set of relevant items Z-"" is defined by Z-”” = Z\Z^"^. 

The set of relevant valid items V-” is given by V-^ = V nZ-”. □ 
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In the following examples of parsing schemata we will only be concerned with 
relevant valid items. 

Example 4.34. {the buE parsing schema) 

An Earley parser proceeds through a sentence from left to right. A bottom- 
up parallel Earley can start at each word in the sentence in parallel. To that 
end, a larger set of initial deduction steps is added and the predict steps are 
discarded. 

As usual, we define the parsing schema buE by specifying a parsing system 
FbuE for an arbitrary grammar G: 

^buE — 2, j] I A—^OL0 €PA0<Z<j}= ^ Earleys 

j^Scan ^ {[A-4a.o/?,i,j], [a,j,j + 1] I- [A^aa.l3,i,3 + 1]} = 

DbuE = 

It is left to the reader to verify that the set of relevant valid items is given by 

V-''(IP6u^) = {[A~^a.p,iJ] e I a=>*ai+i • . 

It is obvious, again, that from final items [5— ^7*,0,n] for a string ai . . .Un, 
only those are valid for which 7 =>*ai .. .an ^ Irrelevant items surely do not 
contain parse trees, hence we conclude that the parsing schema is correct. □ 



Example 4.35. {the buLC parsing schema) 

The parsing schema buE is correct but it contains some slight redundancies.® 
Suppose that we have a valid item [A->-B»/3,i,j]. How is such an item de- 
duced? The only way to establish the validity of this item, is by using a valid 
item [J5-^7.,i,j] as an antecedent in the complete step 

[A-^.B0,i,i],[B-^^.,i,j] h [A-^B.fi.i.j]. 

The item [A-^.B0,i,i] does not play any significant role in the bottom-up 
variant of Earley’s algorithm; it is valid by definition. No harm is done if we 
delete it as an antecedent and replace the complete step by an - in this case 
- equivalent deduction step 

[B-^7.,z,j] h [A->B./3,i,j], 

as illustrated in Figure 4.2. 

A similar argument applies to items of the form [A-)>a./3, z, j] and the ap- 
propriate scan step. Hence, most items with a dot in leftmost position serve 
no purpose, other than satisfying the buE specification for historic reasons. 

® This optimization was proposed by Kilbury [1984], cf. [Leiss, 1990]. 
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(It should be noted, though, that items with a dot in leftmost postion are 
indispensible if the right-hand side of the production is empty. The above 
argument does not relate to items of the form j].) 

Based on these considerations, we define a parsing schema that very similar 
to buE but slightly more economic. This schema is called buLC, which is 
an abbreviation of a bottom-up Left Corner schema. As usual, we specify a 
parsing system for an arbitrary grammar G £ CTQ, dcs follows: 

= {[A-^Xa.l3,i,j] I A-^Xafi € P A 0 < i < j}, 

TbuLC = 1(1) U 1(2); 

= {\-[A^.,j,j]}, 

£)LC(a) ^ 

DLC(a) ^ {[A-,a,,i,j]^[B^A.0,i,j]}, 

j^Scan _ [a,hj + 1] 1“ [A-^aa.0,i,3 + 1]}, 

jjCompi _ [[A->a.B0,i,j],[B-^7;j,k]\-[A^aB./3,i,k]}, 

DhuLC = j)LG(a) ^ j)LC(A) ^ j)Scan ^ j^Compl 

From the above discussion it follows that 

V^"(P6utc) = V^"(P6uB)nl6„tc 

= {[A-7a.fi, i,i] € I ...aj 

f\ {a ^ e\f P — e)} 

and that the buLC schema is correct. □ 

Example 4.36. {the LC parsing schema) 

In the above example we defined buLC as a slightly more economic version of 
buE. If a constituent has been recognized completely, i.e., we found an item 
[5->7.,i, j], we use a left-corner step and recognize an item 
This could be done, because, in the buE schema, the item is 
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Fig. 4.3. The {predictive) left- corner step 



always valid. If we try the same transformation on the (left-to-right) Ear- 
ley schema, things get slightly more complicated. It is not the case that 
z] is always valid. Hence, the replacement of a deduction 

by a deduction [B—>j*,i,j] h [A— z, j] should be allowed only in those 
cases where [A-^*B(3,i,i] is actually valid. Under which conditions is this 
the case? The item [A-^.Bl3,i,i] is predicted by Earley only if there is some 
item of the form 

[C->a»AS^ /z, z] 

It could be the case, however, that a = e. Then this is one of the items that is 
not contained in the domain of the buLC schema and we continue the search 
for an item that licences the validity of z, z]. This search can end in 

two ways: either we find some item with the dot not in leftmost position, or 
(only in case z = 0) we may move all the way up to [5->*7, 0, 0]. This can be 
formalized as follows. 

The left corner is the leftmost symbol in the right-hand side of a production. 
A-^Xa has left corner X] an empty production A-^e has left corner e. 

The relation on x (U U {£:}) is defined by 

^ ?7 if there is a production A-^a G P which has left corner U. 

The transitive and reflexive closure of is denoted >*. 

It is clear that [A-4..H/3, z,z] will be recognized by the Earley algorithm if 
there is some valid item /i,z] with E >* A. Moreover, there is 

such an item with a / e, unless, perhaps, z = 0 and E = 5. In order to 
deal with this exceptional case, we must make sure that items of the form 
[5— ^*7,0,0] are in the domain, all other items of the from [A~>.a,z,z] with 
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a 7^ £ are dispensable. To replace the missing complete steps, we introduce 
left-comer steps as follows: 

[C-^a.ES,h,i], [B-¥'y.,i,j] I- [A-^B.0,i,j] only if E >* A. 

A schematic illustration is shown in Figure 4.3. 

A similar argument holds for items of the form [A-¥a»0,j — 1, j]. 

Thus we obtain a full formal description of the LC schema, as usual by 
defining a parsing system for arbitrary G G CEG- 

l(i) = {[A^Xa.0,i,j]\A-^Xa0ePAO<i<j}, 

e PAj > 0}, 

l(3) = {[5-^.7, 0,0] 1 5-^7 G P}, 

Ilc = 

= {h [5^.7, 0,0]}, 

£)LC(A) _ ^[C-A-y,E6,h,i],[A-^a.,i,j] h [B^A.0,i,j] \ E >} B}, 
jjLC(a) ^ {[C-^'y.ES,h,i],[a,i,i+l]\-[B-^a.0,i,i-^l]\E>} B}, 
DLC(e) ^ {[C^'^.E5,h,i],\-[B-A.,i,i]\E>*iB}, 
jjScan _ {[A- 7 a.a;d,i, j], [a, j, j + 1] b [A->^ao./?,i,j + 1]}, 

^Compi ^ {[A^a.B0,i,j],[B^j.,j,k]\-[A-A-aB.0,i,k]}, 

Prom the above discussion it follows that 

V^"(Ptc) = V^"(P£„Hey)nItC 

= {[A—>oc»0,i,j] G ^iQ I cx^ Ui+i ■ • • o,j A 

5=J>*ai . . . OiAj for some 7}. 

It should be mentioned that this schema reflects the (generalized) Left-Corner 
algorithm as it has been described in the literature. Deterministic LC parsing 
has been defined by Rosenkrantz and Lewis [1970]. See also [op den Akker, 
1988]. Generalized LC parsers have been described by Matsumoto [1983], 
Nederhof [1993], and Sikkel and op den Akker [1992, 1993] 

When it comes down to implementing this schema, the efficiency can be 
increased by adding additional predict items of the form [D,i], denoting the 
fact that some item of the form [C^a.DS, h, i] has been found, abstracting 
from items in I similar to the way in which items abstract from trees in T. 
A more detailed treatment will be given in Chapter 10. □ 
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There are some obvious relations between the parsing schemata buE, 
Earley, buLC and LC. The definitions of these parsing schemata are not 
“stand-alone” definitions, in a way. We have defined buLC and LC, infor- 
mally, by applying transformations to the parsing schemata buE and Earley. 
Subsequently we have given formal definitions that satisfy the intuitive un- 
derstanding. In Chapters 5 and 6 we will formalize such transformations and 
discuss under which conditions the correctness of a schema is invariant under 
a transformation. 



4.7 Conclusion 

We have generalized the theory of tree-based parsing schemata of Chapter 3 
to (item-based) parsing schemata. Tree based parsing schemata can be seen 
as a special case in which every item comprises a single tree. 

In this chapter we have seen that there is a tension between theoretical 
elegance and pragmatic convenience. In order to cover schemata for pars- 
ing algorithms that appear in the literature we have gone so far as to allow 
items that are inconsistent with the (most elegant) underlying theory. Sub- 
sequently we have argued that the difference is not relevant for practical 
parsers. Semiregularity is a rather natural property of parsing schemata. A 
minor problem, but nevertheless another sore point, is the distinction be- 
tween relevant and irrelevant valid items. Here we have settled for a minor 
practical inconvenience in order to avoid a major theoretical inelegance. In 
both cases we have extensively motivated our design choices and we have 
argued that different choices, looking like plausible alternatives, have more 
serious drawbacks. 

These frictions in the theory are caused by the sometimes incompatible 
interests of theory and practice. If we would look at it from a purely theo- 
retical perspective, it is very simple to come up with a smaller and rather 
more elegant theory, in which only regular parsing schemata are considered. 
Our major concern, however, is that the theory can be applied for the de- 
scription of practical parsers; the theory is not a purpose in itself. That the 
parsing schemata framework can be applied to describe a variety of parsers 
will be shown in Chapter 6, where half a dozen parsing algorithms known 
from the literature are fitted into a single taxonomy of Earley-related pars- 
ing schemata. More involved applications will follow in Part III. 
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In Chapters 3 and 4 we have formally established the notion of a parsing 
schema, and presented some examples. In this chapter and the next one we 
will discuss relations between parsing schemata. 

The main notion that we are concerned with here is refinement. A parsing 
schema is a refinement of another schema when it allows more deductions 
or when it performs the same deductions in smaller steps. This notion has 
a twofold application. Firstly, we can identify some chains of refinements, 
describing schemata for parsers that exist in the literature. Secondly, if a 
parser is known to be correct, the refinement relation can be used to prove 
the correctness of another parser. 

In Section 5.1, we go back to the more abstract setting of enhanced de- 
duction systems and establish some general notions like homomorphism and 
isomorphism. Next, in 5.2, we formally introduce the notion of refinement for 
parsing systems and schemata. Some examples of refinements are presented 
in 5.3. In 5.4 we introduce generalization., i.e., applying a parsing schema to a 
larger class of grammars. Generalization usually includes refinement as well. 

In Chapter 6, subsequently, we will study the notion of a filter. Filters 
are, in a general sense, the inverse of refinement: a filtered system makes less 
deductions or contracts sequences of deductions to single deduction steps. 
Using refinements and filters, a large variety of parsers can be described 
within a single taxonomy. In Section 6.6 an overview is given that summarizes 
the relation between the different kinds of relations introduced in Chapters 
5 and 6. 



5.1 Mappings between deduction systems 

In Section 5.1 we are concerned with mappings between arbitrary enhanced 
deduction systems, say Ei and E 2 . In each case we will assume that Ei = 
(Xi,Hi,Fi,Ci,Di) and E 2 = (X 2 ,i/ 2 ,F 2 ,C 2 ,i^ 2 )- Furthermore, we write 
A\ for zi(Ei) A 2 for Zi(E 2 ), Vi for V(Ei) and V 2 for V(E 2 ). Similar definitions 
apply to deduction systems Dj and © 2 ; the only thing that has to be changed 
is deleting all conditions on Fi and Ci. But we are primarily interested in 
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enhanced systems, here, because an interesting aspect of mappings between 
deduction systems is whether correctness is preserved. 

Definition 5 . 1 . {pointwise extensions of a function) 

Let El and E2 be deduction systems, and f : Hi \J Xi~^H2 U X2 a function. 
We can define a function / : p{Hi U Xi)-^p{H2 U X2) that maps sets of 
entities into sets of entities by pointwise application, i.e., 

f{Y) - {X2 e X2 I 3 X 1 C y : f{xi) = X2>. 

A function / that maps Hi UXi to H2UX2 can also be extended to a function 

f' • Pfin{Hi U Xi) X Xi—>pfin{H 2 U X2) X X2 

that maps deduction steps to deduction steps by pointwise application: 
f'{Y,x) = {f{Y),f{x)). 

f' can be extended to /', similarly, mapping sets of deduction steps into sets 
of deduction steps, by analogy to the extension of / into /. 

In a similar vein, we can extend / into a function /" that maps deduction 
sequences in Ei to deduction sequences in E2. (Note, however, that there is 
a consistency issue here, because it is not generally guaranteed that /"(F b 
xi h . . . h Xj) is a valid deduction sequence in E2!) And, finally, we can maps 
sets of deduction sequences into sets of deduction sequences by a function 
/". When no confusion can arise about the domain of a function, we simply 
write / for /', /", /, /' and /" as well. □ 

The purpose of the functions /' and f” as defined above is that we can use 
them to state conditions on functions in a concise, well-defined and intuitively 
clear manner. If we state, for example, that 

f{Di) - D2 

this means 

(^253:2) G D2 if and only if there are Yi G pfin{Hi U ATi) and xi G Xi 
such that f{Yi) = Y2 and /(xi) = X2 and (Yi,xi) G Di. 

Similarly, 

f{Ai) = A2 

is a clear and concise notation for 

^2 1~2 xi 1-2 . . . 1-2 Xj if and only if there are Yi G pfin{Hi U Xi) with 
f{Yi) = Y2 and x'^, . . . ,x'- G Xi with f{x{) = x[ for 1 < i < j such that 
Yi hi x'l hi ... hi x'-. 

Mappings - and other relations - between deduction systems can have sev- 
eral interesting properties. First of all, the usual properties on relations may 
apply. Relations like refinement, extension, generalization and various types 
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of filters all are transitive and reflexive. Reflexivity is always trivial, transi- 
tivity sometimes. In Section 5.2 we will see that transitivity of refinement 
is not straightforward. Other properties that relations may have is preserva- 
tion of soundness / completeness / correctness. We discuss relations between 
deduction systems here, as opposed to relations on (the domain of) a single 
deduction system. 

Definition 5.2. {preservation properties of relations ) 

Let El and E2 be arbitrary enhanced deduction systems. A relation R be- 
tween deduction systems is called soundness / completeness / correctness 
preserving if Ei R E2 and the soundness / completeness / correctness of Ei 
are sufficient conditions for the soundness / completeness / correctness of E2 . 
Let Pi and P2 be arbitrary semiregular parsing schemata for some class of 
grammars CQ. A relation R between parsing schemata is called soundness / 
completeness / correctness preserving^ if Pii2P2 and the soundness / com- 
pleteness / correctness of Pi are sufficient conditions for the soundness / 
completeness / correctness of P2. □ 

Definition 5.3. {homomorphism) 

A function f : H\\J X\ ->H2 U X2 is called a homomorphism from Ei to E2 
if: 

(z) /(iJi)CF2, 

{ii) /(Xi\Fi) CX2\F2, 

(m) /(Fi)CF 2, 

{iv) /(Ci)CC2, 

{v) f{Di) C D2. □ 

Definition 5.4. {isomorphism) 

A homomorphism / : X\ Uifi— >^^2 is called an isomorphism from Ei to 

E2 if an inverse function f~^ : X2-^X\ exists and f~^ is a homomorphism 
from E2 to El . 

As a practical notation we write X\ =/ X2 if / is a bijective function from 
X\ to X2. We write Ei E2 if / is an isomorphism from Ei to E2. We write 
El = E2 if there is a function / such that Ei =/ E2. 

Two parsing schemata Pi and P2 are isomorphic on a class of grammars CQ 
if for each G eCQ and for each ai . . . On € Z’* it holds that 

Pi(G)(ai...an) = P2(G)(ai...an). 

We write Pi = P2 if Pi and P2 are isomorphic. □ 



The inverse of an isomorphism is also an isomorphism. Furthermore, an 
isomorphism is correctness preserving. A homomorphism, in general, is not 
correctness preserving. The soundness can be violated by adding deduction 
steps to E2 that validate entities in F2\C2- The completeness can be violated 
by adding new, invalid entities to C2. 
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Preservation of completeness can be guaranteed by demanding that 
the homomorphism be surjective^ i.e., f{Hi) = H2, f{Xi\Fi) = ^2\F2, 
f{Fi) = F2, /(Cl) = C2, f{Di) = £>2. As soundness is always the trivial 
part of a proof, completeness preservation is almost as useful as correctness 
preservation. But in the sequel we will make much use of a slightly more 
restricted kind of homomorphism that does preserve correctness as well. 

Definition 5 . 5 . {item contraction function) 

A function f : XiU Hi~^X2 U H2 is called an item contraction from Ei to 
E2 if 

(0 Hi=fH2 
(u) f{Xi\F,)=X2\F2 
{Hi) f{Fi) = F2 
{iv) f{Ci) = C2 

{v) /(Z\i) - Zi2 □ 

The reason that we demand that deduction sequences are mapped onto de- 
duction sequences, i.e., f{Ai) = A2, rather than f{Di) = D2, will become 
clear in Section 5.2 where item refinement, the inverse of item contraction, is 
defined. The mapping of deduction sequences, rather than deduction steps, 
is needed to establish transitivity of a more general notion of refinement that 
also includes step refinement. 

Corollary 5 . 6 . 

Let El be an enhanced deduction system, = a regular congruence relation on 
E. Let /^ : E— >>E/ = be the canonical function that maps x onto [x]. Then 
/^ is an item contraction function. □ 



We can apply item contraction directly to parsing schemata, but it only 
makes sense to do so on regular schemata. We can extend the idea to semireg- 
ular parsing schemata by only considering the regular subschemata. 

Definition 5 . 7 . {the relations and ) 

Let El and P2 be semiregular parsing systems. The relation Pi P2 holds 
if there is an item contraction function / : PJ-^'P^ between the regular sub- 
systems of Pi and P2. 

Let Pi and P2 be semiregular parsing schemata for some class of grammars 
CQ, The relation Pi P2 holds if, for each G e CQ and ai . . . Un G F* it 
holds that 

Pi(G)(ai ... an) P2(G)(ai . . . an). □ 

Corollary 5 . 8 . 

The relation is transitive, reflexive, and correctness preserving; 

The relation is transitive, reflexive, and correctness preserving. □ 
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Corollary 5 . 9 . 

The following statements hold: 

• Let T be a tree-based parsing system, ^ a regular congruence relation on 
T. Then T T/^. 

• Let T be a tree-based parsing system, an =2 regular congruence 

relations on T and =1 C =2 x x' implies x =2 x'). Then 

T/^i ^T/^2. 

• Let Pi , P2 be semiregular parsing schemata, = a regular congruence rela- 
tion on P[, and PJ/^ = P^. Then Pi P2. □ 



5.2 Refinement: a formal approach 

We can see Earley- type algorithms as a refinement of CYK-type algorithms. 
The latter recognize constituents, whereas the former also deal with partial 
constituents. A single step in a CYK parser corresponds to several steps in 
an Earley parser. 

More generally, but still informally, refinement of parsing systems (and 
hence parsing schemata) can be seen as consisting of two steps: 

• item refinement: Some items are split up into smaller items; the set of 
deduction steps is adapted accordingly. 

• step refinement: Single deduction steps are refined into series of deduction 
steps. To this end, new items can be added as well. 

We will define item refinement and step refinement separately and afterwards 
define refinement in such a way that it is the simultaneous transitive closure 
of both kinds of refinement. 

Definition 5 . 10 . {item refinement) 

Let Pi = {Xi,H,Di) and P2 = {X^.H.D^) be semiregular parsing systems. 
The relation Pi P2 holds if P2 Pi . 

Let Pi, P2 be semiregular parsing schemata for a class of grammar CQ. The 
relation Pi P2 holds if P2 Pi. □ 

Item refinement, in general, is the reverse of item contraction. But in the 
remainder of Chapter 5 we are specifically concerned with parsing systems 
and schemata, not deduction systems in general. In this more specific set- 
ting, the conditions for item contraction and refinement can be simplified. 
Firstly, we notice that the hypotheses will always be the same, hence the 
condition Hi =f H2 can be deleted. Secondly, we will introduce a simple 
regularity constraint on functions, that allows us to discard a few condition 
from Definition 5 . 5 . 
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Definition 5.11. {regular item mapping) 

Let Pi, P2 be semiregular parsing systems. A function / : is called a 

regular item mapping if for ell l e I[ and all r 6 ^ it holds that r G f{i). □ 

Lemma 5.12. 

Let Pi, P2 be semiregular parsing systems. If there is a regular item mapping 
/ : Z2->X[ such that 

0) 

{ii) = f{Al), 
then Pi P2. 

Proof. It suffices to show that the following inequalities hold: 

(m) /(X2\4"’) C 

(it;) C /(l2\4"’). 

(«) 

(m) C 

(mi) /(C2)CCi, 

{viii) Cl C /(C2)- 

Inequalities (in), {v), and (mi) follow straight from the definition. 
Inequalities {iv) and (m) follow from (iii) and {v) in combination with (i) 
and the fact that is regular. 

For (via) we have to realize that U Ci = UC2 (both are equal to Vcicii . • . an) 
by definition). Take an arbitrary Li E Ci and let r G li. Then there must be 
some L2 € C2 with r G ^2- Because / is a regular item mapping, it must hold 
that f{i2) = Li- □ 

Example 5.13. 

Item refinement is usually combined with step refinement. Therefore an ex- 
ample of item refinement in isolation may seem somewhat artificial. 

Consider the parsing schema CYK for grammars in CA/IF, cf. Example 4.27. 
We can replace items of the form [A,z, j] by items of the form [A-^a,2, j], 
for each production A-^a. If there are different productions with the same 
left-hand side, the CYK item is split up accordingly. Thus we get a parsing 
schema CYK’ by defining a system fcYK’ for arbitrary G G CMP: 

^CYK'= I € PAO < i < j}; 

= {[a,j - l,j] [A^a,j - 1, j]}, 

£>(2) = {[B^ 0 ,i,j],[C-^'r,j,k] h [A-^BC,i,k]}, 

Dcyk’ = U 



It is left to the reader to verify that CYK’ is a correct parsing schema. □ 
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If IPi P 2 then the correctness of P 2 implies the correctness of Pi 
(Corollary 5.8). The reverse, however is not true. Most item refinements 
that are defined in a sensible manner will preserve correctness as well. But if 
one really wants to refine a correct system into an incorrect one, that can be 
done. An example of what can go wrong (only if mischief is intended) is the 
following. 

Let us refine CYK items into items [A— where de- 
notes a set of trees [A->{a ^ Suppose, now, that we have a 

grammar 

S-^AB I BB 
A — ^ci, 

B-^b. 

In the CYK system for this grammar we have a deduction step 

[A, 0,1], [^,1,2] h [5,0,2]. 

It is possible to refine this into a deduction step 

[A-^a, 0,1], [5-^5, 1,2] h [5->^5,0,2], 

and refine the other deduction steps as in Example 5.13. The resulting system 
is neither sound nor complete. For a string ab, the item [S-^AB, 0, 2] that 
contains the (only) parse tree will not be recognized, whereas the final item 
[5 — >jB 5,0,2] that does not contain a parse tree is valid in this system. 

A general method to make sure that item refinement is correctness pre- 
serving is the following. Let Pi be a correct parsing system, and T the under- 
lying tree-based system of P^, i.e., there is some regular congruence relation 
= such that Pi = T/ = 1 . if Pi is correct, it is usually not difficult to estab- 
lish that T is correct as well. One has to redo the correctness proof based 
on trees, rather than items. If one defines a refinement P 2 of Pi such that 
P 2 = T/ =2 5 then P 2 must be correct as well. This is clearly the case in 
Example 5.13, CYK and CYK’ both have TCYK as underlying tree-based 
parsing schema. 

We will now turn to step refinement, which is rather more easy to de- 
fine than item refinement. Step refinement is completeness preserving. For 
practical applications this is almost as good as correctness preservation, be- 
cause soundness is always the easy part and completeness the hard part of a 
correctness proof. 

Definition 5.14. (step refinement) 

Let Pi, P 2 be semiregular parsing systems. The relation Pi P 2 holds if 



{i) I1CI2, 

(ii) l-f C 
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We call P 2 a step refinement of Pi . 

Let Pi and P 2 be semiregular parsing schemata for some class of grammars 
CG- The relation Pi P 2 holds if, for each G e CG and for each ai . . . Un G 

r*, 

Pi (G)(ai ...an) ^ P2(G)(ai . . , Un). □ 

Note that a sufficient condition for (n)is hi C h^, or even D\ C h^ . We have 
written h* in the definition only because of the symmetry. The motivation 
for this desire for symmetry will become clear in Chapter 6; we define a series 
of relations, all with a similar symmetry. 

Corollary 5.15. The relation is reflexive, transitive and completeness 
preserving. 

The relation is reflexive, transitive and completeness preserving. □ 

Example 5.16. 

We define a parsing schema ECYK, that is a (bottom-up) Earley-like re- 
finement of CYK. Or, to be more precise, a step refinement of CYK’. The 
schema ECYK is defined only for grammars in CJ\f!F. It is in fact identical 
to buE restricted to C^^T. For a grammar G in Chomsky Normal Form we 
define a parsing system Fecyk by 

^ECYK = {[A-^a.0,i,j] I A-^a0 G P, 0 < z < j}; 

= {\-[A^.a,j,j]}, 

jQScan ^ {[A-^a.a0,i,j], [aj,j + 1 ] h [A^aa.l3,i,j + 1 ]}, 
j^Compi _ ^[j\-^a.B^,i,j],[B-^'y.,j,k]\-[A-^aB.f3,i,k]}, 

Decyk = u U 

In order to prove that CYK’ ECYK it suffices to show, for an arbi- 

trary grammar G G CAfJ^, that 

(j) XcYK’ Q Xecyk 

{ii) for each j/i , . . . , j/fe h x G Dcyk' it holds that yi,...,yk ^%cyk 

We identify an item [A-^a^i,j\ G Icyk’ with an item [A-^a.^i,j\ G Tbcyk- 
Then, obviously, Icyk’ C Iecyk- 

As to the second condition, let [o, j — l,j] h [A-^a,j - l,j] G Dcyk'- Then, 
in fscYK, we have 

t- [A-^.a,j-lJ-l] 

[A-^.a,j -l,j -l],[aj -l,j] I- [A-^a.,j -l,j] 
hence [a,j - l,j] \-%cyk - 1, j]- 
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For a deduction step h [A-^BC,i,k] G Dcyk’ we 

have 



h [A-^»BCji,i], 

[A-^.BC, z, z], [B-^ 0 .,i, j] h [A-^B.C, z, j], 

[A-^B.C,iJ], [C-^j.J,k] h [A^BC.,i,k], 

hence we have shown that [B —>>/?•, z,j], [C 7 — ^7., j, A:] ^%cyk [A-^BC*,i^ A;]. □ 

We can now define refinement as a combination of item refinement and 
step refinement. Refinement is a transitive relation, but this time transitivity 
is not obvious from the definitions. 

Definition 5 . 17 . {refinement) 

ref 

Let Fi and P2 be semiregular parsing systems. The relation Pi — > P2 holds 
if there is a parsing system P3 such that Pi P3 P2. 

I*0f 

Let Pi and P2 be semiregular parsing schemata. The relation Pi => P2 
holds if there is a parsing schema P3 such that Pi P3 P2. □ 

Lemma 5 . 18 . {refinement lemma) 

Let Pi, P2, P3 be semiregular parsing systems such that Pi P2 P3. 
Then there is a system P4 such that Pi P4 P3 . 

Let Pi, P2, P3 be semiregular parsing schemata for some class of gram- 
mars CG‘ Let Pi P2 P3. Then there is a schema P4 such that 

P3. 

Proof We only prove the lemma for parsing systems. Generalization to pars- 
ing schemata is as usual. 

Let / : Xo-fX^ be the item contraction function from Po to Po. Then we 
define P4 by 

I4 = {x e I3 \ /(x) € li}, 

Z>4 = {(y,x) € UI4) X I4 I f{{Y,x)) €D[AY\-t x}. 

Although item contractions are usually specified by regular item mappings, 
this is not a requirement. So, in order to prove that Pi P4 we have to 
show that the conditions for item contraction in Definition 5 . 5 , applied to 
the notion of a semiregular parsing schema, are satisfied. That is, we must 
establish 

(ii) /(i;\ 4 ”’) c 

(in) 

(iv) C 
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M /(C 4 )cCi, 

{vi) Cl C f{C^), 

{vii) AlCfiAl), 

(via) f{Al) C A\. 

Moreover, in order to prove that P4 P3 we have to show that 

(ix) 24 C J3, 

(x) h* C h*. 

The inequalities (n), (iv), (ix), and (x) follow directly from the definition of 
P4, {via) is a straightforward extension. 

The inequalities {v) and {vi) follow from the fact that C4 = C3 and C\ — €2- 
In order to prove (z) and {Hi) we will first establish and auxiliary result: 

(xz) X{Cf{X\), 

A proof of (xz) is straightforward: 

Let X G 2[. Then also x G hence there is an x' G X^ with /(x') = x. 
Thenx'GX^. 

Hence it follows that x £ f{X\). 

The inequalities (z) and (zzz) follow from (zz) and {iv) combined with (xz) 
and the regularity of X\. 

So we are left with (mz), the only case for which a proof requires some effort. 

We will use an ad-hoc notation T h* xi h* . . . h* Xj G A which means 
that there are (possibly empty) sequences Zi^rm for 1 < z < j 

such that 

Y h Zi,i h . . . f- Zi^mi b Xl h ... h Zj,l h . . . h Zj^rn, b Xj G 

Now we prove {vii) as follows. Let T bi xi bi . . . bi xj G A[. Then it 
holds that 

T b2 Xi b2 . . . b2 Xj G A2. 

Moreover, there are Y' G pfin{H\JXs) x2s with f{Y') = Y and x[,. . . , Xj 
with f{x[) = Xi, . . . , f{x^j) = Xj, such that 

r b3 *x;b 3 *...b*x' gzi 5- 

Then, clearly, it follows that Y' x[ b4 . . . b4 x'- G A^^ hence we have 
shown that 

ybixibi...bi Xjef{Ai), 

We conclude from (z)-(z;zzz) that Pi P4 and from (zx)-(x) that P4 P3. 

□ 



Theorem 5.19. 

i*0f ref 

The relations — y and ==> are transitive and reflexive. 

Proof: directly from Lemma 5.18. □ 
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5.3 Some examples of refinement 

We will informally discuss a few simple examples of refinement and in one 
case give a proper proof. Every refinement can be split up into two separate 
steps: item refinement and step refinement. Each of those steps can simply 
be the identity relation. 

Example 5.20. (GCYK ^ buE) 

Generalized CYK is a variant of CYK that can handle arbitrary context- 
free grammars. The parsing schema GCYK is specified by a parsing system 
^GCYK for arbitrary G £ CTQ, as follows. 

'I'GCYK = 

= {[Xuio,ii],-.-,[^k,ik-i,ik] (- [AJoJk] \ 

A-^Xi...Xk e PAfc > 1}, 

= {h [A,j,j]\A^e£P}, 

Dgcyk= 

Note that for grammars G £ CAfX it holds that focYK = ^cyk (cf- Exam- 
ple 4.27). The deduction steps in cover productions of the form A—^BC 
and productions of the form A-^a. For grammars in Chomsky Normal Form, 
is empty. 

Now we claim that a parsing system ^buE (cf. Example 4.34) is a refine- 
ment of Pgcvx- As a first step, we refine CYK items [A,i,j] into Earley 
items of the form [A^a*^i^j] for every production A-^a for a given left- 
hand side A. If there is more than one production for A this means a proper 
item refinement, otherwise it is just a different notation for the same partial 
specification of a tree with root A and yield • • - Oj - The terminal items 
representing the string are denoted [a, z — l,i] as ever. Thus we obtain an 
item-refined system Fgcyk’- 

Tgcyk^ = I A->a € P,0 < z < j}; 

£)(i, 2) _ {[Xi,z’o,zi], . . . , [ATfcjZfc-ijZjfe] f- [A->a.,io,u] | 

A — £ P A k ^ 

where «m] denotes [a,Zm-i,^m] if Xm = a 

and [Xm,Zm-i,^m] denotes if Xm = B, 

D- = { h [A^.JJ]}, 

Dgcyk'= 

Next, we can straightforwardly refine Tqgyk’ into FbuE- Take, for example, 
an item of the form [A-^bBC*,i,k]. This is valid in I^gcyk* iff there are 
valid items [ 6 , z,z -h 1], [B--^p*,i -f l,j] and [C— ^ 7 ., j, fc]. In an item 
[A-¥.bBC,i,i] is always valid. Using the antecedents of the GCYK deduc- 
tion step one by one, we deduce a sequence of items [A-^b*BC,i,i 4 - 1], 
[A-^bB.C,i,j], [A-^bBC*,i,k]. □ 
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Example 5 . 21 . (GCYK ^ buLC ^ buE) 

In Example 4.35 we have introduced a parsing system fbuLC from a system 
"^buE by discarding most of the items with a dot in leftmost position. The set 
of deduction steps was adapted accordingly. Reversely, one could derive F^uE 
from FbuLC by inserting the missing items with a dot in leftmost position 
and adapting the set of deduction steps. It is left to the reader to verify that 

FbuLC FbuE and thus FbuLC FbuE- Hence, in general, buLC 
buE. 

Similar to Example 5.20, it can also be shown that GCYK buLC. □ 
Example 5 . 22 . (LC ^ Earley) 

Similar to Example 5.21, one can show that Earley (cf. Example 4.32) is 
in fact a refinement of LC (cf. Example 4.36). The LC schema is more 
complicated than buLC, and we will use the occasion to give a somewhat 
more formal proof. 

Proof. We will prove that Plc F Earley for an arbitrary grammar G E 
CTQ. 

We abbreviate F Earley to Fe- We have to prove 

(0 ^LC Q 

Inequality (z) follows immediately from the definitions. Rather than (ii) we 
will prove 

(m) if (Y,x) e Dlc then Y x, 

from which (m) follows. For the sets of deduction steps and 

this is a direct consequence of (z). It remains to be shown that (Hi) 
holds for and . We will work out the case in 

detail, the other cases are similar. 

Let 

Then, by the definition of it holds that C >* B. Assume C >\ B. 

Then, by the Earley predict we find 

[(7^ — /i, z] [B — z,z] 

and, with a complete step. 

Hence we have shown that [C"-^7.C(5 ,/i,z], [A->>a.,z,j] [B-^A.p,iJ]. 

□ 
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All the above examples involve parsing schemata that are defined onCTQ. 
We will now look at a few parsing schemata that are defined only on CNT . In 
Section 5.4, subsequently, we will extend CMT schemata to CTQ schemata. 

In Section 2.2 we have seen an informal example of Rytter’s algorithm 
[Rytter, 1985], [Gibbons and Rytter, 1988]. An almost identical algorithm 
was described earlier by Brent and Goldschlager [1984], but received little 
attention because it was published in a less widely circulated journal. Both 
algorithms are described by a single parsing schema that we will call Ryt- 
ter, as this is the more familiar algorithm. We will come back to Rytter’s 
algorithm in Chapter 14. 

Example 5.23. (CYK Rytter) 

Apart from the terminal items in H, a Rytter parsing schema uses two types 
of items. Firstly there are the ordinary CYK items [A,i,j], which comprise 
completed trees of the form (A ^ . . .a^). We also call them complete 

items in this context. Secondly, we use almost- complete items for trees of the 
form 

Such items are denoted [A, /i, k; B, i,j]. An almost-complete item can be seen 
as a CYK item with a gap. If [A, h, k; B, i,j] is valid, and another valid item 
[B,i,j] can be deduced, then the gap can be filled and [A, /i, A:] is also valid. 
The gap can also be filled with another almost-complete item; The result is 
an almost-complete item, again, but with a smaller gap. A complete item, 
finally, can be extended to an almost-complete item by combining it with a 
production. If there is a production A—^BC € P then a complete item [B, z, j] 
can be extended to an almost-complete item [A, z. A;; (7, j, fc] for arbitrary k 
(and similarly, an almost-complete item with a leftmost gap can be created 
if a valid item is the rightmost right-hand side symbol of a production). As 
usual, we do not worry about the fact that k can be extended beyond the 
sentence length n. For any given sentence one could restrict the set of items 
to the set of relevant items for the appropriate sentence length. 

For a grammar G € CMP we define a parsing schema Rytter as follows. 

= {[A,h,k; B,i,j] \ A,B E N AO < h < i < j < k}, 

1 Rytter = 

= {[a,j - l,i] h - l,j] I G P}, 

= {[B,i,j]h[A,i,t,C,j,k]\A-^BC eP}, 

= {[C,j,k]\-[A,i,k-,B,i,j]\A^BC EP}, 

= {[A,h,k-,B,i,j]AB,h3] I- [A,h,k]} 




98 



5. Refinement and generalization 



£)(3) rr {[A,/i, fc] h [A,h,m]C,j,k]} 

Duytter = £>W U U £»(!'’) U u £,(3) 

The operations associated with the sets of deduction steps and 

are originally called activate^ pebble^ and square, respectively. These 
terms stem from a “pebble” problem, where a pebble has to be laid on every 
node in a tree. In this context these original names do not make sense and 
we rather use numbers. 

It is a trivial that Fcyk ^Rytter- D 

In the above example, an intermediate parsing system between ]?cyk and 
^Rytter can be defined simply by discarding from ^Rytter- Let’s call this 
Pfl 2 for short. The system Fr 2 is a step refinement of Fcyk in the most 
literal sense; a CYK deduction step is split up in two steps. It is also clear 
that Pi ^2 ^Rytter, in a more degenerate way; Dfiyuer simply contains Dr 2 
as a subset. 

The problem of such a conceivable parsing system R2, however, is that 
it combines disadvantages of both schemata. CYK on the one hand, finishes 
in linear time (in a parallel implementation) with relatively few resources. 
Rytter, on the other hand, needs much more resources in order to guarantee 
that all valid items are deduced in logarithmic time. The R2 schema has 
the same formal complexity bounds as CYK, but when constant factors 
are taken into account it simply needs more resources - in time, space and 
number of processing units - than CYK. 

A more useful intermediate algorithm located between CYK and Rytter ’s 
algorithm is described in [Sikkel, 1993a]: a parallel algorithm for online pars- 
ing that uses O(n^) processors to parse the next word in constant time. The 
classical CYK algorithm can be implemented in 0{n) time using O(n^) pro- 
cessors, as was shown by Kosaraju [1969, 1975], but only if the entire sentence 
is known when parsing begins. The online parallel CYK algorithm - assuming 
that the parser is fast enough to do all processing before the next word arrives 
- finishes in constant time after the last word. The parsing schema for this al- 
gorithm, called OCYK, extends CYK with almost-complete items that have 
the gap in rightmost position. Unlike almost-complete Rytter items, there is 
no need to specify a position to which this rightmost gap extends. 

Example 5.24. (CYK ^ OCYK ^ Rytter) 

In addition to [A,i,j] as an abbreviation for [A we write 

[A,i,j;B] to denote an item 

We specify a parsing schema OCYK, as usual, by defining a parsing system 
^OCYK for an arbitrary grammar G G CJ\TT, as follows. 
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AB€iVA0<i<j}, 

T^ocyk = 

£>(o) = {[a,j-l,j]h[A,j-lJ]\A^aeP}, 

Z)(i) = {[B,i,j]\-[A,i,j;C]\A^BCeP}, 

P(2) ^ {[A,i,r,B],[B,j,k] h [Ai,fc]} 

= {[A,i,j-,B],[B,j,k;C] h lA,i,k;C]j 
Docyk = U £1(1) U £»(2) u £)(3) 

Clearly, Pcric ^ocyk- 

Refining Tqcyk into F/?2 (and subsequently to fRytter) is not limited to 
step refinement, this time. Items have to be refined into items 

first. □ 



5.4 Generalization 

Generalization comprises two notions that may be used in combination. 
Firstly, a refinement, as discussed in 5.2 is a generalization; the refined sys- 
tem is a richer deduction system. Secondly, and more importantly, a parsing 
schema for a narrow class of grammars can be extended to a larger class of 
grammars. Often this can’t be done straightforwardly (otherwise the pars- 
ing schema would simply have been defined on a larger class of grammars) 
but involves refinement as well. As a canonical example, we will see that the 
bottom-up Earley schema is a generalization of the CYK schema. 

Definition 5.25. {extension) 

Let Pi be a parsing schema for a class of grammars , P2 a parsing schema 
for a class of grammars CQ2 and CQ\ C C^2- 

Then the relation Pi P2 holds if, for each grammar in C^i and each 
a\ .. .On E X**, 

Pi(G)(ai ...an) = P2(G)(ai ...Un) □ 



Definition 5.26. {generalization) 

Let Pi, P2 be semiregular parsing schemata. 

Then the relation Pi P2 holds if there is a semiregular parsing schema 
P3 such that Pi P3 P2. □ 

Unlike the refinement lemma, it is obvious that if Pi P2 P3 there 
is a P4 such that Pi ^ P4 P3. The schema P4 is obtained simply by 
restricting P3 to the smaller class of grammars for which Pi is defined. 
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Corollary 5.27. 

The relation =5^ is transitive and reflexive. □ 

Example 5.28. (CYK ^ buE) 

In Example 5.20 the Generalized CYK schema GCYK has been defined. It 
has in fact been shown that 

CYK ^ GCYK ^ buE. □ 

Above we have argued that =!> can always be replaced by 

. Swapping the relations in Example 5.28 yields the intermediate 
system ECYK that has been defined in Example 5.16: 

CYK ECYK ^ buE. 



5.5 Conclusion 

We have introduced refinement and extension as relations that can be used to 
describe a parsing schema as a generalization of another schema. Refinement 
is the more involved notion; extension simply means applying a schema to a 
larger class of grammars. Generalization is a combination of refinement and 
extension. 

By means of some practical examples, involving algorithms known from 
the computer science literature, we have shown that refinement is a useful 
notion for relating parsing schemata to one another. It should be noted, 
however, that refinements are described between existing schemata. There 
is no recipe that allows to derive a better schema from a given schema by 
applying some kind of refinement. 

Refinement means more items, more deduction steps, and more things to 
compute. If a refinement produces a “better” schema, then the improvement 
will be qualitative. Refining Generalized CYK to Earley is such an improve- 
ment, because the complexity of the algorithm can be reduced by considering 
partially recognized productions, rather than only completely recognized pro- 
ductions. If a refinement does not obtain such a qualitative improvement, it 
is likely to make a parser less efficient because more work has to be carried 
out. 

In the next chapter we will be concerned with filtering, i.e., improving 
the efficiency by discarding parts of a parsing system. Filtering is in some 
ways the inverse of refinement. It is used for quantitative improvements in 
the efficiency: diminishing the number of valid items and deductions that 
have to be applied. 
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Sometimes it is possible to argue that some deduction steps in a parsing 
system cannot contribute to the recognition of a parse. If such deduction 
steps exist, no harm is done when these are deleted from the parsing system. 
Such optimizations usually do not lead to a decrease in complexity bounds 
(otherwise the algorithm was inefficient indeed), but it is always worthwhile 
when a (sometimes large) percentage of computation time can be saved by 
cutting out redundancies. Optimization in this sense is called filtering} In 
this Chapter we will define various types of filtering and see that several filters 
known from the literature are special cases of the general approach that is 
presented here. 

The optimization obtained by a filter does not always come for free. The 
cost, usually, is a more complicated description of the parsing schema. The fil- 
tered schema may state explicitly that from a clearly defined set of deduction 
steps only a rather more complicatedly defined subset remains. 

Another side effect of filtering is that it is often at odds with parallel 
implementation. The time efficiency of a parallel parser may crucially depend 
on a certain redundancy with respect to other resources: space and number 
of computing units. A typical example is the Earley parser. In its standard 
form, the string is necessarily processed from left to right. If the top-down 
filter is deleted (i.e., the predict operator is discarded and any item that could 
be predicted is added in advance) one can start parsing at every position in 
the sentence in parallel. In that case it is not hard to define a parser that 
uses 0(n) time on O(n^) processors. This speed-up can only be obtained at 
the cost of redundancy in predicted items. 



^ In addition to syntactic filtering one can also apply semantic filtering, i.e., dis- 
carding (parts of) parse trees that are syntactically correct but known to be 
irrelevant on the basis of extra-syntactical knowledge. In natural language pro- 
cessing, because of the ambiguity of human language, semantic disambiguation 
is a major issue that has generated a vast body of literature. In programming 
languages it is sometime convenient to specify an ambiguous grammer with addi- 
tional disambiguation rules (e.g. operator precedence in arithmetic expressions). 
Klint and Visser [1994] and Visser [1995] discuss how semantic filtering can be 
integrated in the Parsing schemata framework. Here we only consider syntactic 
filtering. 
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A more dramatic example where redundancy is essential to speed up a 
parallel algorithm is Rytter’s algorithm (cf. Examples 2.3, 5.23), It does a vast 
amount of redundant work, increasing the number of processors from O(n^) 
to 0(n®), in order to finish in logarithmic time. For each binary branching 
parse tree there is some way in which it can be constructed in parallel in a log- 
arithmic number of steps. But as it can’t be foretold which way is successful, 
one has to try all the ways. 

Cutting out redundancy may eliminate possibilities for parallel processing, 
but it is all the more useful in sequential implementations. 

We will make a general distinction between static and dynamic filtering. 
At a practical level, in computer implementations of parsing algorithms, static 
filtering can be done compile-time^ while dynamic filtering is done run-time. 
This is what is suggested by the terms “static” and “dynamic” . On our more 
abstract level of parsing schemata, the characteristic difference is that static 
filtering is independent of the particular string that has to be parsed, whereas 
the effect of dynamic filtering does depend on the string. A static filter can 
be applied when an uninstantiated parsing schema contains items and/or 
derivation steps that are redundant for every input string. These can simply 
be discarded. Dynamic filtering, on top of that, allows certain derivation 
steps to be applied only if it follows from an already explored context in 
the string that such steps are meaningful in that context. That is, additional 
antecedents are required to derive a consequent. 

As a running example in Chapter 6 we will use (a schema for) the algo- 
rithm of de Vreught and Honig [1989, 1991] and define several filters on it. 







Fig. 6.1. The include and concatenate operations of dVH 
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As we will see along the way, the algorithm is related to Earley’s algorithm 
and the LC algorithm. The main difference is that constituents need not be 
recognized in a left-to-right manner. The items used by de Vreught and Honig 
are double dotted items of the form [A-^a./?.7, z, j], with the part /? of the 
right-hand-side already expanded and a and 7 still to be recognized. Such an 
item denotes the set of trees [A—^a {P ^ • • • a) 7] (cf. Section 4.3). The 

algorithm of de Vreught and Honig has two basic steps, called include and 
concatenate. The idea of both steps is illustrated in Figure 6.1. The formal 
definition should be clear. 



Example 6.1. (dVHl, the algorithm of de Vreught and Honig) 

For an arbitrary grammar G ^CTQ and string ai . . . Un a derivation system 
^dVHi is defined by 

^dVHi = A-¥a0'y e P A0<i <j}; 

= {[o, j - 1, j] h [A^a.a.jJ - 1, j]}, 

DConcat _ 

DdVHi = 



If follows easily (cf. de Vreught and Honig [1989]) that 

V-"(Pdvm) = {[A-^a.0.^,i,j]eX I /3=>*aj+i ...aj A 

(/? £ V 07 = e) }. 

Note that allows deduction of [H— j,^*] also for j > n, because D is in- 
dependent of the sentence length. Hence we are only interested in the set 
of valid items with position markers not exceeding n. (cf. Definition 4.33). □ 

Throughout Section 6 we write Pi to denote a parsing system Pi = 
{Xi,Hi,Di). We will define the refinement relations on parsing systems P, 
rather than on general deduction system P, because the definitions are mo- 
tivated by applications in parsing. It should be clear, however, that all these 
relations have obvious generalizations to arbitrary deduction systems D and 
enhanced deduction systems E. 



As a first, almost trivial example of static filtering we will look at redun- 
dancy elimination in Section 6.1. Static and dynamic filtering are illustrated 
and formally defined in Sections 6.2 and 6.3, respectively. In 6.4 we will look 
at an even stronger form of filtering called step contraction^ in which sets of 
deduction steps can be contracted to single deduction steps. 
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Step contraction is the inverse of step refinement that has been introduced 
in Section 5.2. A typical example of step contraction has been given already 
in Section 4.6, where a Left-Corner parsing schema was obtained as an op- 
timization of an Earley parsing schema. In 6.5 a taxonomy of Earley-related 
parsers is drawn up, making use of the filters defined in 6. 2-6.4. Section 6.6, 
finally gives a schematic summary of all types of relations defined in Chap- 
ters 5 and 6. 



6.1 Redundancy elimination 

A very simple kind of static filtering is redundancy elimination. If a parsing 
system (or any other deduction system) contains steps that can be deleted 
without affecting the validity of any item, these steps must be redundant. 
The same holds for nonvalid items. As a typical example, an inconsistent 
item can be deleted. 

Definition 6.2. {redundancy elimination) 

Let IPi and P 2 be semiregular parsing systems. The relation Fi IP 2 holds 
if 

(0 I'l 5 ^2 

(u) Di ^ D2, 

(Hi) V(Pi) - V(P2). 

Let Pi and P 2 be semiregular parsing schemata for a class of grammars CQ. 
The relation Pi P 2 holds if, for each G e CQ and each ai . . . On G 17*, 

Pi(G)(ai...an) ^ P2(G)(ai...an). □ 

By definition, redundancy elimination is correctness preserving. 

Corollary 6.3. 

For any semiregular parsing system P it holds that P P^. 

For any semiregular parsing schema P it holds that P P^. □ 

Example 6.4. (dVH2, redundancy elimination) 

We observe that DdVHi is redundant, in the following way. 

An item [A-^a»XYZ»j^i^j] can be concatenated in two different waiys: 

[A-^a.X.YZj,i,%[A-^aX.YZ.^,kJ] h [A-^a.XYZ.^,iJ]; 

[A-^a.XY.Zj,i,l],[A-^aXY.Z.^JJ] h [A-^a.XYZ.j,iJ]. 

Moreover, if [A-^a»XYZ»j^i.,j] is valid, then each of the four antecedents is 
also valid for some value of k and 1. Hence, if we delete the former deduction 
step from D, the set of valid items is not affected. 

In general, [A-^a*l3.^J,j] with 0 a string of k symbols, fc > 2, can be 
deduced in fc— 1 ways. All but one can be discarded. For an arbitrary grammar 
G € CXQ a parsing system VdVH 2 is defined by 
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IdVH 2 = {[A-^a.0.'),i,j]\ e P /\Q <i < j}\ 

= {[a, j - l,j] I- [A-^a.a.'yJ - 1, j]}, 

= {h [5^.., 

= {[B-^.0.,i,j]i-[A-^a.B.^,i,j]}, 

j^Concat _ ^[A-^a.0.X'^,i,i],[A-^a0.X.'y,j,k]\-[A-^a.0X.'^,i,k]], 

DdVH2 = U U U £)Croncot 

It trivially holds that XdVHi = XdVHs and DdVH2 Q Ddvni Moreover from 
the above argumentation we know that V{^dVH2) = Vi^dVHi)- As this holds 
for arbitrary grammars, we conclude dVHl dVH 2 . □ 



6.2 Static filtering 

Static filtering means no more and no less than discarding parts of a parsing 
system (or, in general, a deduction system). This idea - and the following 
formal definition - may seem gratuitous; correctness is preserved only if one 
can argue that the deleted parts are indeed not relevant to the correctness 
of the system. But this is precisely why it fits into a general hierarchy of 
filtering. Any filter will do, as long as one is able to argue that the remaining 
system is still complete. 

Definition 6 . 5 . {static filtering) 

Let Pi and P2 he semiregular parsing systems. The relation Pi P2 holds 
if 

(0 Xi 2 X 2 

(n) Di D D2. 

Let Pi and P2 be arbitrary parsing schemata for a class of grammars CQ. 
The relation Pi P2 holds if, for each G G CQ and each ai . . . On € 
Pi(G)(ai...a„) ^P2(G)(ai...a„). □ 

It is obvious that the relations — > and => are transitive and soundness 
preserving. Unlike redundancy elimination, the completeness is not automat- 
ically preserved by a static filter. In order to prove that a specific static filter 
preserves correctness one should argue that the deleted valid items are indeed 
redundant. 
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Example 6.6. (dVH3, static filtering) 

We will further optimize the parsing system f^dVH 2 for some arbitrary gram- 
mar G. We observe that items of the form [A— j] with \a\ > 1 and 
|/3| > 2 are useless in in the sense that they do not occur as an an- 

tecedent in any derivation step. Hence, these items can be discarded. This 
does effect the set of valid items; some of the discarded items were valid. But, 
more importantly, none of the discarded items is a final item (i.e., an item 
that indicates that a parse exists, cf. Definition 4.20). 

Similarly, any item of the form with \a\ > 1, \0\ > 2 and 

I7I > 1 can concatenate to the right, but cannot contribute to the recognition 
of a final item. The whole set 

{[A-^a.(3.j,i,j] I |a| > 1 A |/3| > 2} 

can be considered useless; it does not contain any final item and items in this 
set are used as antecedents only to deduce other items in this set. Elence we 
delete this set, and discard all deduction steps that have one of these items 
as antecedent or as consequent. The deduction system fdVHs for an 2irbitrary 



grammar G 


is defined by 


l(i) 


= {[A^a*X.j,i,j] 1 A->aXy G P AO < i < j} 


j(2) 


= {[A^.X0.j,iJ]\ A-^X0^eP AO <i<j} 


l(3) 


= {[A-^.«, j, j] 1 A->6 G P a j >0} 


IdVHS 


= X^^^ UX^^^ UX^^^ 




= {[a,j-l,j]h[A->a.a.7 ,j-l,j]}, 




= {h j, j]}. 


jjlncl 


= {[5-7./?., f- [A-^a*Bm^,i,j]}, 


j^Concat 


= {[A->.a.X7,i, j], [A->a.X.7,X fc] h [A—^»aX.y,i,k]} 


DdVHS 


j^Init u Ij jjlncl jj j^Concat 



From the above discussion it follows that 

U {[A— >a.X.7, i,j] G X \ X^=>*ai^i . . ,aj}. 



Moreover, clearly, dVH2 => dVH3. □ 




6.3 Dynamic filtering 107 



6.3 Dynamic filtering 

The purpose of filtering is to reduce the work that needs to be done to 
derive all valid entities. In static filtering we did so by discarding “redundant” 
parts of the derivation system. It is called static, because the redundancy is 
independent of the particular string that is to be parsed. In a real parser this 
means that the filter can be applied compile-time. Dynamic filtering is more 
powerful. The recognition of items can be made dependent on the existence 
of other items. In this way context can be taken into account. If we have, 
for example, an item and a production A-^BC then we could 

restrict the deduction step 

[.0— z, y] h [A-^mBmC j] 

to those cases where aj^i could be the first word of a string produced by C. 
That is, we could replace the deduction by a set of deductions 

[5^./?.,z,y],[a,y -f 1,;] h [A-^.B.C,i,j] 

only for those a such that a E First(C) (cf. Definition 6.10). Hence, dynamic 
filtering, on a theoretical level, is simply adding antecedents to existing de- 
ductions. 

In the following definition, static filtering is a special subcase of dynamic 
filtering. This fits the interpretation that static filtering materializes to to 
compile-time optimization and dynamic filtering materializes to run-time op- 
timization; an optimization that can be done compile-time could also be done 
run-time instead of compile-time. 

Definition 6.7. {dynamic filtering) 

Let Pi and P 2 be semiregular parsing systems. The relation Pi P 2 holds 
if 

{i) Xi 2 X 2 
{ii) hi D h2. 

Let Pi and P 2 be semiregular parsing schemata for a class of grammars CQ. 
The relation Pi P 2 holds if, for each G E CQ and each ai . . . Un G Z’*, 

Pi(G)(ai...a„) -^P2(G)(ai...a„). □ 

Like with static filtering, it is obvious that and are transitive and 
soundness preserving. 



Example 6.8. (buE Earley) 

The parsing schemata buE and Earley have been defined in Examples 4.34 
and 4.32, respectively. In order to verify that buE Earley holds, we 

compare the sets D Barley and D^ue for an arbitrary grammar. The item sets 
are identical. 
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The scan and complete steps are identical in both schemata. For the predict 
and init steps, it suffices to verify that 

in ^Earley holds Only if 
^buE [B-^.7, j] 

in ]^buB- This is evidently the case. □ 

Example 6.9. (buLC LC) 

See Examples 4.35, 4.36 for the definitions of buLC and LC. 

Similar to the previous example. □ 



As another example of dynamic filtering, we will look at the algorithm of 
de Vreught and Honig again. The more sophisticated version of the algorithm 
uses bottom-up filtering, making use of a (one-position) left and right context. 
An item [A->a./3.7, i, j] is recognized only when it can possibly contribute 
to a parse, given the left context a{ and the right context aj^i. We define the 
set of context-dependent items CX C X by 

CX(G,ai ...an) = {[A-^o./3.7,z, j] | 

3(5i, (^2, <^3, (^4 : #S%=>*SiA 62 a 6ia=^*5saiA 
l3=>*ai^i ...Qj A 'yS2^*aj^i64}. 

Here we use, for the first time, the beginning-of-sentence and end-of- sentence 
marker. These guarantee that every word, also the first and the last word, 
have a left and right neighbour. The beginning-of-sentence marker could be 
deleted, at the expense of formulating special constraints for i = 0. The use 
of the end-of-sentence marker is essential, because it is the only way to define 
the nonexistence of the (n -f l)-st word. 

We are to design the system FdVHs now, in such a way that V-'^{FdVH 3 ) Q 
CX. But we cannot simply restrict the domain from X to CX, because CX does 
depend on the string to be parsed and the domain must be independent 
of the sentence. Hence we take a different line and operationalize the test 
for membership of CX within the parsing schema. We can simply follow de 
Vreught and Honig [1989] using the functions first and FOLLOW [Aho and 
Ullman, 1977], and their right-to-left counterparts last and precede. 

Definition 6.10. {Context, First, Follow, Last, Precede) 

We will use FlRST(a) and Last(o) only for strings a such that a 7^* 

^ We take advantage of the fact that First(q:) is used only for a that do not 
rewrite to (or are) the empty string. In the more general, case were First is 
used in any context, one needs a more complicated function 

FiRSx(a) = {a | 3/3, 'y,S : ^07 A a'y=>*a6}. 

Similarly for Last. 
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FlRST(a) = {a\3!3 : 

Last(q;) = {a\3p: 

Follow(X) = {a I 3a,/?: #S%^^aXa!3], 

Precede(X) = {a|3a,;3: #5$=>*aaX^}. 

The predicates LContext, RContext and Context are defined by 
LContext{A,a,a) = 36 E Precede(A) : a E LAST( 6 a), 
RContext{A,j,c) = 3b E Follow(^) : c E First( 76 ), 

Context{A^ a, 7 , a, c) = LContext{A, a, a) A RContext{A, 7 , c). □ 

Corollary 6.11. 

[A— >^a./?. 7 , 2 , j] E CX iff /?=>*ai+i . . .Oj and Context{A^a,^^ai,ajj^\). □ 

The notion Context is not dependent on a particular input string a\ . . . a„. 
We can now proceed with the definition of a parsing schema for the dVH 
algorithm that takes context into account. We will actually give two such 
schemata, dVH4 and dVH5, being dynamically filtered versions of dVHl 
and dVH3 respectively. 



Example 6 . 12 . (dVH4, dynamic filtering) 

For arbitrary G E CTQ a parsing system FdVH 4 defined by 

XdVH4 = {[A-^a./S.j.iJ] I A-^a/3^ E P A 0 <i < j}; 

= {[a, j - 2,j - 1 ], [b,j - 1 , j], [c,j,j + 1 ] 

1 - [A—^a.b.y,j — 1 , j] I Context{A,a,j,a,c)}, 

= {[a,i-l,i],[B->.l3.,i,j],[b,j,j + 1] 

h [A-^a»B»j,i,j] I Context{A,a,y,a,b)}, 

DConcat _ i, j], [i4— >a/3i./32»7ji) 

DdVH4 = Dlriityjjjeyjj^Incl^jjConcat^ 



Note that = D^y^^K There is no need to demand Context{A, a, 7 , a^, 

Ok+i), because this follows from Context{A,a^ (32’>0'i,o>j+i) and Context{A, 

The set of relevant valid items is limited to those relevant valid items of 
^dVHi that are member of CX. 
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= V^”(Pdra/)nCI 

= {[A->a./3.7,i, j] e 1 1 

I3=^*ai^i . . .aj 
A {l3 £ \/ aj = e) 

A 3(Si,J 2 : #5$=>*(5iaiAaj4-i(52}. 

We have defined operators of by adding antecedents to operators of 

FdVHi- Hence, if Y ^dVH4 x it follows a fortiori that Y \~dVHi ^ and we 
conclude 

dVHl dVH4. □ 

We have applied two filters at the parsing schema dVHl. On the one 
hand, statically, we have discarded items that cannot contribute to the recog- 
nition of a valid item. On the other hand, dynamically, we have taken context 
into account in the definition of the deduction steps. These optimizations are 
orthogonal, in the sense that they don’t interfere with each other. The final 
version of dVH is obtained simply by merging the two filters. 

Example 6.13. (dVH5, final version) 

For an arbitrary context-free grammar the parsing system WdVHs is defined 
by 

= {[A^a.X.7,z, j] I A-^aXj ^ ^ A 0 < i < J}, 

J(2) = {[A^.xp.y,i,j]\ A-^XPjeP AO <i<j}, 

= {[j 4^.., jf, j] 1 A—^e € P A j > 0), 

TdVH5 = 

I- [j4-^a.b.7, j — 1, j] \ Context{A,a,'y,a,c)}, 

= { h 

= {[a,i-l,i],[B^.(3.,i,j],[b,j,j + 1] 

h [A->q:.B. 7 , 2 , j] I Context{A,a,'j,a,b)}, 

jjConcat ^ {[A-^.a.Xj,iJ],[A-^a.X.^J,k] h [A^.aX.y,i,k]}, 

DdVH5 = U U U 

The set of relevant valid items is limited to those relevant valid items in 
^dVHs that are member of Cl. 

V^^ifdVHs) - {[A^a.X0.^,i,j] € I \ 

Xfi:=>*ai-^i . . .Oj 
A{a = eW p = e) 

A 361,62 : if^S$=^*5iaiAaj^iS2}. 
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It is left to the reader to verify 

dVH4 ^ dVH5, 

dVH3 ^ dVH5. □ 

In an algorithm derived from these parsing schemata one can efficiently 
implement the left and right context predicates by storing the allowed 
preceding/following terminals for every production and dot position in a ta- 
ble. If this implementation technique is used, it is clear that dVH5 yields 
the most efficient parser of all dVH schemata, because the lecist number of 
items is recognized at negligible extra cost per reduction. 



6.4 Step contraction 

The final and most powerful kind of filtering is step contraction. As the 
name suggests, it is indeed the reverse of the step refinement relation of 
Section 5.2. The general idea is the following. When an algorithm takes small 
and easy steps, it can sometimes be speeded up by taking somewhat larger 
and perhaps more complicated steps. Such an optimization will typically 
improve an algorithm with a (small) constant factor. 

It is paradoxical, perhaps, that both step refinement and step contraction 
are useful for improving the practical performance of a parser. The differ- 
ence, with respect to practical implementations, is that step refinement is 
used for qualitative changes whereas step contraction is merely used for in- 
creasing the efficiency without making changes to the underlying principles 
of an algorithm. As a typical example of the former, consider GCYK 
buE, which decreases the complexity of a parser from to O(n^), 

where ^ is the length of the longest right-hand side. An example of the latter 
is Earley GHR, the schema for the improved Earley parser that was 
described by Graham, Harrison and Ruzzo [1980]. 

A consequence of this paradox is that step refinement and step contraction 
per se are not necessarily useful. Too much refinement yields unproductive 
intermediate results, while too much contraction may lead to a more com- 
plex algorithm. But the purpose of our formalism of parsing schemata is not 
primarily that it can be used to improve parsers; the main objective is to de- 
scribe at the right level of abstraction how parsers are related to one another 
and what precisely is improved by introducing certain variants. 

Definition 6.14. (step contraction) 

Let Pi, P 2 be semiregular parsing systems. The relation Pi P 2 holds if 

(i) X 1 D/ 2 , 

(n) hj D h*. 




112 6. Filtering 



Let Pi and P2 be semiregular parsing schemata for some class of grammars 
CQ. The relation Pi P2 holds if, for each G E CQ and for each a] . . . a„ G 

Pi(G)(ai...a„) ^ P2(G)(ai...a„). □ 

Corollary 6.15. 

For any two parsing systems Pi, P2 or parsing schemata Pi, P2 it holds that 
Pi P2 if and only if P2 Pi; 

Pi P2 if and only if P2 => Pi- □ 

Any dynamic filter, as a consequence, is also a step contraction - although 
of a somewhat degenerate form: no real contraction of sequences of deduc- 
tion steps takes place. As for proper step contractions, we could in principle 
make a difference between static step contractions (multiple steps in D\ are 
contracted to single steps in D 2 ) and dynamic step contractions (also in- 
cluding addition of antecedents). This is of little use and only leads to more 
complicated definitions. All the following examples belong to the staitic kind. 

Example 6.16. (Earley vs. Left- Corner) 

In Example 5.22 we have shown that Earley is a step refinement of LC. 
It makes more sense to define it the other way round, because we have con- 
structed the LC schema (cf. Example 4.36) as a slightly more efficient variant 
of Earley. 

The same holds for the bottom-up variants of both algorithms. Hence, 

Earley ^ LC; 
buE ^ buLC. 

In fact we have already proven this in Examples 4.36 and 4.35 where the 
Left-Corner schemata were introduced by stripping some redundancies from 
the Earley schemata. □ 

Example 6.17. (dVH3 buLC) 

See Examples 6.6 and 4.35 for definitions of dVH3 and buLC. As usual, 
we consider parsing systems fdVH 3 sind IP6tiLC for sin arbitrary grammar G 
and string a \ ... an- 

First, we show that IbuLC Q ^dVHs- There is a notational difference, because 
^dVH 3 uses double-dotted and PbuLC single-dotted items. But it is clear that 
items [A-^a.0,i,j] and are just different denotations for a 

single item 

[A-)-(a tti+i-.-aj) 0]. 

So we observe that ^fuLC = ^®“ce IbuLC C 

^dVH3‘ 

Next, we have to show that ^ ^dVH 3 - suffices to show that 

for every deduction step ryi . . . , 77^ h ^ G D6uLC it holds that r]i . . .^rjk ^dVH3 
We check each type of deduction step in FbuLC' 
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• ^buLG ^ I^dVHs by definition. 

• ^buLG ^ D‘dVH 3 by definition. 

• D^uLC^ C ^dl/H 3 by definition. 

• arbitrary deduction step 

[A->.a.a0,iJ],[aJJ -i-1] h [A-¥.aa./3,iJ -h 1] 
is emulated in FdVHs by 

[aJJ-hl] f- [A-^a.a.(3JJ -^1], 
[A->.a.a0,iJ], [A-^a.a.0JJ -f 1] b [A-^.aa.0,iJ + 1]. 

• ^buLC^ • arbitrary deduction step 

[A — i, j], [jB— >•' 7 #, jf, /u] h [A — 
is emulated in FdVHs by 

[^->. 7 ., j, fc] h [A^a.B./3,j,k], 
[A—>^ma»Bl3,i,j],[A—^a»B»P^j,k] h [A—>*aB»0^i,k]. 

Hence we conclude that DbuLC Q ^*dVH 3 ' 

Next, we will introduce the improvement of the Earley algorithm by Gra- 
ham Harrison and Ruzzo [1980], also known as the GHR algorithm. It has 
been designed as a step contraction of the Earley algorithm. A bottom-up 
variant of GHR also exists. 

Another step contraction on bottom-up GHR, that will be treated subse- 
quently, has been defined by Chiang and Fu [1984]. This last variant allows 
parallel implementations where it takes exactly n steps to parse a sentence of 
length n (rather than 0{n) steps involving a constant that is dependent on 
the grammar, as in bottom-up Earley, or maximally 2n steps as in the GHR 
algorithm). 

Example 6.18. (GHR) 

The algorithm of Graham, Harrison and Ruzzo makes two improvements 
upon the original definition of Earley: 

• nullable symbols (i.e. symbols that can be rewritten to the empty string) 
can be skipped when the dot is worked rightwards through a production; 

• chain derivations (i.e. derivations of the form A=^~^B) are reduced to single 
steps. 

For an arbitrary grammar G and string ai . . . a„ we define a parsing system 
Fghr as follows. 

Ighr = {[A^a.0,i,j] I A-^a0 e P /\0 <i < j}; 

Dinit ^ {h [5-^;9.7,0,0] I 0^*e}, 
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jTfScan _ ^[A-^a.a/3'y,i,j],[a,j,j + l]h[A-^aa0.'y,i,j+l] 

I 0^*e}, 

I i < j < k A 0^*e}, 

£)C2 _ [C'->(S.,z,_;] h [j4-4aB/?.7,i,j] 

I i<jAB^*CA0^*e), 

DPred ^ {[A-^a.B0,i.3]y[C-^a'.0',j,j] \ B^* O') A a' e) , 

Dghr = 

In order to verify the correctness of GHR - at the same time showing that 
Earley ^ GHR - we will split the step contraction into two separate 
filters. As an intermediate schema we define GHR’. For an arbitrary G and 
ai ... On we define ^ghr^ by 

Ighr^ = '^GHR', 

^GHR’ = {[A-^a.B(i'yJ,j],[B-^d.J,k]^[A~^aB/3.y,i,k]\l3^*s}, 

^GHR’ = {[A^a.B0j,i,i],[C^S.,i,j]\- [A-i-aB0.-y,i,j] \ 

B^*C A 0=>*e}, 

Dghr ’ = Dq'^j^ U D^^'^ U - U , U D^’j^j^. 

In the first step, Earley GHR’, only new deduction steps are added. 

These extra deduction steps are contractions of steps that existed already. 
Hence we have only introduced redundancy and it holds that Earley 4^ 
GHR’. 

Secondly, from GHR’ to GHR we will delete some redundancies, but dif- 
ferent ones from those that have just been introduced. It has to be shown 
that steps in are redundant for i = j ox j — k and steps in are 
redundant for i = j. Take, for example, the case that j = k. If one has 

h [A-^aB0.'y,i,j\ € 

then B is nullable. Hence, for any deduction step with consequent [A->a.R/?7, 
z, j], there is a similar deduction step that skips the nullable string Bfi and 
produces [A-^aB/3*j,i,j] directly. 

The other case are similar. Thus we conclude that GHR’ GHR and 

hence 

Earley ^ GHR. 

The correctness of GHR follows from the observation that 
Earley ^ GHR’ ^ GHR 

and the fact that is not affected by redundancy elimination. □ 
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Example 6.19. (buGHR) 

A bottom-up variant of GHR is straightforward from the definitions of buE 
and GHR. For an arbitrary grammar G and string a\ .. .an we define a 
parsing system fhuGHR as follows. 

^huGHR — {[A— z, j] I A—>a0 G P AO < z < j}; 

= {h[A-^a.0JJ] I a=>*e}, 

jjScan ^ {[A-^a.a0'y,i,j],[a,j,j + 1] h [A-^aa0.^,i,j + 1] 

I 

— {[A->a.P/37,z, j], [P->(5., A:] h [A->q:P^. 7 , z, A:] 

I i < j < k A 0=>*e}, 

= {[A-4a.P/37,z,z], [C->5.,z,;] h [A^oP/?.7,z, j] 

I i<3 AB^^C A0^*e}, 

DhuGHR ^ 

The fact that buE buGHR and the correctness of buGHR can be 

established as in Example 6.18 □ 

Example 6.20. (ChF) 

A small improvement upon the bottom-up variant of the algorithm of Gra- 
ham, Harrison and Ruzzo has been defined by Chiang and Pu [1984]. It is 
step contraction in the most literal sense of the word. The deduction steps 
are somewhat more complicated, but the basic idea is perfectly clear: 

• If an item can be deduced by two complete deduction steps from and 
DC2 FffuGHR, where the consequent of the former is an antecedent of 
the latter, then contract these two steps into a single deduction step; 

• Similar for and in ^ buGHR- 

The deduction steps in and remain as they are. The definition of 

is adapted and a second set of scan steps is introduced. This results in 
the following parsing system. 



^ChF 


= {[A^a.0,i,j] 1 


A-^a0 G P A 0 < z < j}; 


jjinit 




1 cx^*e), 


DSi 




, [a,j,j + 1] h [A^aaP.'y, i,j + 1] 

1 


DSs 


= {[C->7‘aP',i,j], 


[a,j,j + 1] 1- [A-i-aB0.'y,i,j + 1] 

1 B^*C Aap0'^*e}, 
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I i < j < k A (3=>*e}, 

I i< 3 AB^*C Aap^'^^e}, 

DchF= 

It is left to the reader to verify that buGHR ChF. □ 

In ChF, deduces more items than necessary. Only items of the 

form [A-^a*aP,j,j] are used in subsequent steps. There is no longer a need 
for items of the form j, j]; their use has disappeared in the step 

contraction ^buGHR ^ChF- Hence we can apply another redundancy elim- 
ination step. Such minor optimizations have little impact, however, and we 
will not pursue them further. 



6.5 The family of Earley-like parsing schemata 



We have encountered 4 types of filters, so far; redundancy elimination, static 
filtering, dynamic filtering and step contraction. From the definitions it is 
obvious that for any class of parsing schemata 



sf ^ 






We don’t need to introduce a general filtering operation, because every filter 
is a step contraction. In Figure 6.2, an overview is given of most filtering 
relations between parsing schemata discussed in Chapter 6. The arrows are 
labelled with the most restricted type of filter that applies in each case. dVH2 
has been left out because it is only an intermediate step in the staitic filter 
from dVHl to dVH3. Each arrow is also labelled with the number of the 
example in which it is discussed. 



Theorem 6.21. 

A filtering relation holds between any two parsing schemata displayed in 
Figure 6.2 if and only if they are connected by a sequence of arrows. 

Proof. For the individual arrows, see the examples referred to. Transitivity 
(and reflexivity, for empty sequences) is obvious from the definitions. 

As for the nonexistence of filtering relations, this has to be verified for every 
not‘Connected pair of schemata, but it is always obvious. □ 



The filtering relations in Figure 6.2 constitute a directed acyclic graph 
with several sources and several sinks. Is there a more general schema from 
which both buE and dVHl can be derived by applying a filter? And can the 
filters that produced ChF, GHR, LC and dVH5 be combined, producing a 
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dVHl 




ChF GHR LC 



Fig. 6.2. Filtering relations between schemata discussed in Chapter 6 



single, optimally filtered schema? Such schemata can indeed be derived, but 
their practical value is doubtful. 

We have seen several examples of composite filters that are composed 

of “orthogonal” components. We have dealt with dVHl dVH5 ex- 
tensively; buE LC is another case. In Figure 6.3 the taxonomy of 

Earley-like parsing schemata is extended with cross-breedings between the 
sinks of the graph in Figure 6.2. Not all of these schemata are equally useful, 
however. 

The optimization of Chiang and Pu leads to a maximum parallel speed-up 
of 50 %, but does not speed up a sequential implementation. Hence a left-to- 
right version of ChF on a single processor is not faster than GHR - unless 
this is taken a starting point for another static filter, where intermediate 
results are discarded that have become redundant by the Chiang and Fu step 
contraction. 

An LC parser with GHR optimizations, similarly, can be seen as a starting 
point for further static filtering. 

A parsing schema for an algorithm that does exist in the literature is obtained 
by combining dVH3 LC and dVH3 dVH5: a left-corner parser 
with one symbol look-ahead. Our LC schema is the schema for an LC(0) 
parser. A one-word look-ahead can be added to LC like to a dVH schema 
without look-ahead. On the other hand, the dVH5 parsing schema could 
be classified as dVH(l,l), i*e., a schema for the dVH algorithm with one- 
word look-back and look-ahead. The optimization to a buLC(l,l) schema 
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is straightforward. LC(1) is obtained by adding a top-down filter as usual. 
The look-back has become obsolete by the top-down filter. One could also see 
it in a different way: a top-down filter constitutes a look-back of unlimited 
size. A top-down filtered parser takes everything to the left of a constituent 
as context for bottom-up filtering. 

The parsing schemata contained in a box in Figure 6.3 are schemata for 
parsers that have been seriously proposed in the literature. The other ones 
have been added only to illustrate filtering and to complete the picture. The 
algorithm of de Vreught and Honig [1989] has in fact a schema that is located 
between dVH4 and dVH5. The authors have overlooked the possibility 
of statically filtering dVH2 into dVH3 and applied the dynamic filter to 
dVH2. 

A “mother” schema from which both dVHl and buE can be derived by 
step contraction is shown at the top of the graph, To call it dVHO is actually 
unfair to de Vreught and Honig: the schema is rather awkward as is has to 
combine the mefficiencies of dVH and bottom-up Earley. 



Example 6.22. (dVHO) 

For arbitrary G G CJ^Q and ai . ..an a parsing system ^dVHo is defined as 
follows. 

^dVHo = {[A->a.l3.^,i,j] I A-^aP'r € P A 0 < i < i}; 

jQScan _ i,i], [a, + 1] H [j4— ^a.a.7, + 1]}, 

jjCon.pl ^ [A-^a.B.jJJ]}, 

jjConcat ^ {[A^a.0i.02lA,j\AA^Ol0x.02‘l.j,k] 

V- [A-^a.0i02'lA.k]], 

DdVHO = U u U 

It is left to the reader to verify that 

dVHO ^ dVHl, 

dVHO ^ buE. □ 



Figure 6.3 is far from complete; a variety of related schemata could be 
added. In Section 4.6 we have remarked that the Earley schema is also 
the parsing schema of a (generalized) LR(0) parser. One can define filtered 
versions that specify LR(A:), SLR(fc) and LALR(A:) parsers. But we have seen 
enough examples here. Parsing schemata for LR-parsers will be discussed in 
Chapter 12. In chapter 10 we have a closer look at LC parsers. 
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dVHO 




ChF-LC GHR-LC(l) 




ChF-LC(l) 

\ 



Fig. 6,3. A taxonomy of Earley-like parsing schemata 
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6.6 A summary of relations between parsing schemata 

All relations on parsing systems that have been introduced in Chapters 5 
and 6 are summarized in Figure 6.4. The same relations apply to deduction 
systems in general (in which case the item set X should be replaced by a 
general domain X, to be consistent with the notation used in Chapter 4). 
Relations that have been defined between parsing schemata are summarized 
in Figure 6.5. 

A more refined taxonomy of relations is possible. One could define static 
step contraction, which is a superclass of static filtering and a subclass of step 
contraction. Step contraction, then, is a combination of static step contraction 
and dynamic filtering. Static step contractions can be described by a specific 
kind of redundancy introduction followed by redundancy elimination. This 
has been illustrated in fact in Example 6.18, where we discussed the static 
step contraction Earley ==> GHR. 
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Fig. 6.4. A summary of relations between parsing systems 
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Fig. 6.5. A summary of relations between parsing schemata 
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6.7 Conclusion 

In this chapter we have been concerned with optimization of parsing schemata. 
We have defined a series of filtering operations that can be used to strip spu- 
rious items and deduction steps from parsing systems. A variety of parsing 
schemata, describing parsing algorithms known from the computer science 
literature, have been captured in a single taxonomy of Earley-related parsers. 

It is surprising, perhaps, that we can make a clear distinction between 
static filtering and dynamic filtering. The former is usually understood as 
“compile-time” optimization, the latter as “run-time” optimization. The dis- 
tinction can be made at the abstract level of parsing schemata, because static 
filters are independent of the string (represented by the hypotheses) and dy- 
namic filters may depend on the string. Static filtering means discarding 
irrelevant parts of a system; dynamic filtering can take context into account 
by adding antecedents to deduction steps. 

The strongest form of filtering, step contraction, is the reverse of step 
refinement that was introduced in Chapter 5. Both operations are used to 
increase the efficiency of parsers: step contraction is used to diminish the work 
to be done, whereas step refinement is useful in transformations that provide 
a qualitative improvement in the parser. It will be clear that step refinement 
or step contraction per se is not a useful operation. Over-refinement will lead 
to too much work; over-contraction will lead to steps that require additional 
sophistication in a parser that implements such a schema. 

The calculus of parsing schemata that has been developed in Chapters 
5-6 is not a tool that guides a parser designer towards a schema for an 
optimally efficient parser. The question whether the individual deduction 
steps (including search techniques to retrieve the relevant antecedents) can 
be implemented efficiently is not discussed at this level of abstraction. Pars- 
ing schemata are a useful tool, however, to describe the relations between 
various parsing algorithms and to explain precisely the nature of certain op- 
timizations. 

We have now finished the formal theory of parsing schemata for context- 
free grammars. In Part III we will use parsing schemata as a tool for various 
applications (hence Part III can be seen as consisting of several, unrelated 
subparts). As a first undertaking, in Chapters 7-9, we will discuss how feature 
structures can be incorporated into parsing schemata, yielding a practical 
parsing schema notation for unification grammars. 
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In Part II we have developed a formal theory of parsing schemata for context- 
free grammars. In Part III we will apply this theory in several different direc- 
tions. In Chapters 7-9, we discuss parsing schemata for unification grammars. 
In Chapters 10 and 11 we use parsing schemata to define Left-Corner and 
Head- Corner chart parsers. We will prove these to be correct as well. In 
Chapters 12 and 13, subsequently, we derive a parsing schema for Tomita’s 
algorithm as an example of an algorithm that is not item-based. As a result, 
we can cross-fertilize the Tomita parser with a parallel bottom-up Earley 
parser, yielding a parallel bottom-up Tomita parser. In Chapter 14, finally, 
we discuss hard-wired implementations of parsing schemata, in the form of 
boolean circuits. 

We will extend parsing schemata with feature structures, so that schemata 
for parsing unification grammars can be defined. In addition to items that 
describe how a parser deals with the context-free backbone of a grammar, we 
will extend the schema with a notation in which one can specify how features 
are transferred from one item to the other. Thus a formalism is obtained in 
which feature percolation in unification grammar parsing can be controlled 
explicitly. Chapter 7 is a brief, informal introduction. In Chapter 8 we give 
a lengthy, formal treatment of the formalism; some more practical aspects of 
unification grammar parsing are discussed in Chapter 9. 

Unification grammars - also called unification-based grammars, con- 
straint-based grammars, or feature structure grammars - are of central im- 
portance to current computational linguistics. As these formalisms are not 
widely known among computer scientists, it seems appropriate to give an 
introduction that should provide some intuition about what we are going to 
formalize. 

In 7.1 a preview is given of what parsing schemata with feature structures 
look like. While keeping the notion of feature structures deliberately abstract 
and vague, the general idea of such a parsing schema stands out rather clear. 
In 7.2, subsequently, feature structures and unification grammars are infor- 
mally introduced by means of an example. We use the PATH formalism of 
Shieber [1986], with a tiny change in the notation. Anyone who is familiar 
with PATR can skip 7.2. 
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7.1 Unification-based parsing schemata: a preview 

A thorough, formal treatment of unification grammars and parsing schemata 
for these grammars will be given in Chapter 8. As we will see, it requires quite 
some space and effort to do things properly. Parsing algorithms for unification 
grammars constitute a complex problem domain. A wealth of concepts is to 
be introduced, properly defined and - not the least problem - provided with 
clear and precise notations. We will jump ahead now and look at a glimpse 
of what we are heading for. An intuitive understanding of what we are trying 
to formalize may help the reader to get through the formal parts. 

We address the following question: ^^How can parsing schemata be en- 
hanced with any kind of information that is added to the context-free backbone 
of a grammar?'' One may think of attribute grammars, unification grammars, 
affix grammars or any other formalism in which such information can be spec- 
ified. We will be unspecific, for good reason. By refusing (for the moment) to 
use a particular formalism we cannot get sidetracked by all its sophisticated 
details. 

In this section we recapitulate a simple context-free parsing schema, give 
an example of the use of other grammatical information, introduce (fragments 
of) a notation for it, and add this to the parsing schema. 

As an example of a context-free parsing schema we recall the Earley 
schema of Example 4.32. For an arbitrary grammar G G C!FQ we define a 
parsing system ^Earley = where 1 denotes the domain of Earley 

items; H (the hypotheses) encodes the string to be parsed; D comprises the 
deduction steps that can be used to recognize items. Most deduction steps 
are of the form rj,(^h When the antecedents rj and C have been recognized, 
then the consequent ^ can also be recognized. Some deduction steps have only 
a single antecedent. Moreover, in order to start parsing, an initial deduction 
step with no antecedents is included, ^sariey is defined by 

Earley — {[A— i, j] | A-^a/3 G P, 0 < z < j}; 

H [(2-1 ,0,1],..., \cLfi 1 , 7T-] , 

= { h[5^.7,0,0]}, 

j)Soan _ [[A-^a.a0,i,j],[a,j,j + l] h [A-^aa,0,i,j + l]}, 

£)Compi _ 1“ [A-¥aB.0,i,k]}, 

= {[A^a.B0,i,j] H [B^.rjJ]}, 

D Barley = U U U 
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where H varies according to the string a\ that should be parsed. The 
second part of the usual set notation {*,. | has been deleted in most 
cases; by definition ^ deduction steps may only use items from X and H. 

We assume that the context-free backbone of a grammar is enhanced with 
additional syntactic, semantic or other linguistic information. Constituents, 
productions, and items can have certain features^ that express information 
not present in the context-free part of the grammar. This information can 
be of different kinds. A typical use of features is the transfer of information 
through a parse tree. As an example, consider 

In the production S^NF VPj the semantics of S can 6e derived from 
the semantics of NP and VP ty - . . 

If each word in the lexicon has some semantics associated with it, and for 
each production it is known how the semantics of the left-hand side is to 
be derived from the right-hand side, the semantics of the sentence can be 
obtained compositionally from its constituents. 

Another typical, more syntactic way in which features are used is to con- 
strain the set of sentences that is acceptable to the parser. A canonical ex- 
ample is 

In the production S~^NP VF^ there must be (some form of) agree- 
ment between NP and KP. 

The precise nature of the agreement is irrelevant here. Either constituent will 
have some features that could play a role in agreement, e.g. 

the noun phrase 'the boy" is masculinej third person singular , 

but the fact that agreement is required between NP and VP is a feature of 
the production, not a feature of each of the constituents individually. 

Let us now enhance the Earley parser with such features. If we parse a sen- 
tence “The boy . . , at some point we will recognize an item [S-^NP* VP^ 0, 2]. 
We could attach the previously stated information to the item, as follows 

The NP in KP,0, 2] is masculinCj third person singular. 

Hence the VP that is to follow must be masculine, third person sin- 
gular^ 

Next we apply the predict step 

[5-^iVP.l/P,0,2] h [VP^.*vNF,2,2], 
in combination with a feature of the production *v NP: 

^ At this level of abstraction, the word ‘'feature” can be replaced by "attribute” , 
“affix”, etc. All of these stand for roughly the same concept, but refer to different 
kinds of formalisms. 
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In the production VF— > *v NP, the agreement of VP is fully deter- 
mined by the agreement of *v. 

Combining all this information, we obtain the following item annotated with 
features: 

[VP-^.*vNP,2,2] 

VP must he masculine, third person singular; 
hence *v must be masculine, third person singular. 

Gender plays no role in verb forms in English. Demanding that the verb 
form be masculine is irrelevant, but harmless. If the grammar doesn’t specify 
gender for verb forms, it follows that every form of every verb can be used 
in combination with a masculine subject. 

An important concept that must be introduced here is consistency. The 
features of an object are called inconsistent if they contain conflicting in- 
formation. As an example, consider the sentence “The boy scout . . .”, where 
“scout” is known to be both a noun and a verb form. If we continue from the 
previous item and scan a *v, we would obtain 

[VP-^*v.NP,2,S\ 

VP must be masculine, third person singular; 
hence *v must be masculine, third person singular. 

*v is either plural or first or second person singular. 

This is inconsistent and therefore not acceptable as a valid item. 

We need to introduce a tiny bit of notation in order to enhance the Earley 
parsing schema with features. The notation will be explained, but not defined 
in a mathematical sense. We write 

• ^q{A-^q) for the features of a production A->a; 

• ^{X) for the features of a constituent X\ 

• ip{[A-^amf3,i,j]) for the features of an item [A-^a.0,i,j]. 

The index 0 for features of productions is to indicate that these are taken 
straight from the grammar. In both other cases, features may have accumu- 
lated by transfer from previously recognized constituents and/or items. 

The features of an item comprise the features of the production and those of 
its constituents (as far as these are known yet). From an item, the features 
of each constituent mentioned in that item can be retrieved. 

We will not (yet) define a domain of expressions in which features can be for- 
mulated. This is left to the imagination of the reader. We need some notation, 
however, to relate sets of features to one another. Combining the features of 
objects ^ and r) is denoted by The square union (U) may be inter- 

preted as conventional set union (U) if it is understood that we accumulate 
sets of features. Similarly, we write (p{^) C (which may be interpreted 
</?(0 G ^p{rj)) to denote that an object r] has at least all features and values 
of an object ^ but may have other features and values as well. 
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We will now extend the Earley parsing schema with the possibility to 
include features of constituents, productions and items. The parsing schema 
is defined by a parsing system f Earley — ^Earley, H, D Earley) for an arbitrary 
context-free grammar G, where the set H is determined by the string to be 
parsed. The domain is defined by 

^Earley ~ | E P 

(po{A-^a/3) C <p{^) A consistent 

The ^ symbol is used only for easy reference. Subscripting i^j] with ^ 

means that we may refer to the item as ^ in the remainder of the formula. The 
unabbreviated, somewhat more cumbersome notation for the same definition 
is 



P Earley - j] I A-^a(3 E P A 0 <i<j ^ 

(po(^->o:/3) C ip{[A^a.0,i,j]) A 
consistent[^{[A-^a.fiA,j])) }. 

In words: it is mandatory that all features of a production be contained in 
an item that is based on that production. The item may have other features 
as well, as long as this does not lead to an inconsistency. 

The deduction steps are the usual context-free deduction steps, annotated 
with how the features of the consequent are determined by the features of 
the antecedents: 

jjinit ^ { h [5^.7,0,0]^ I = ¥’o(‘5->7)}- 
jjScan _ ^[A^a.aj3,i,j],„[a,j,j + 1],^ h [A-^aa.p,iJ + 

DC°mpi - j]^,[S-+7.,j,A:]f I- [A^aB.!3,i-.k]^ 

I 

£,Pred ^ {[A-^a,B(3A,j\n [■S->-.7. j]« 

I = ^{Br,) U <^o(-B-47)}, 

D Barley = U U U 

The items have been subscripted with identifiers ry, C for easy reference. 
The notation is used for those features of the item r] that relate to 

constituent X, 



7.2 The example grammar UGi 

We will look at a very simple example of a unification grammar. Our example 
grammar does not pretend to have any linguistic relevance. Moreover, the 
example deviates slightly from the usual examples as given by, e.g., Shieber 
[1986]. It is not our purpose to advocate the felicity of unification grammars 
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to encode linguistic phenomena, but to show how context-free backbones of 
natural language grammars can be enhanced with features. Hence, we take 
the context-free example grammar that has been used in chapter 2 and simply 
add features to that grammar. 

The Earley schema of the previous section is too advanced, for the time 
being, and we will parse strictly bottom-up in CYK fashion. If constituents 
B and C are known for a production A-^BC^ then A can be recognized and 
an appropriate feature structure for it will be constructed. 

Different features of a constituent can be stored in a feature structure. 
For each word in the language, the lexicon contains a feature structure^. The 
lexicon entry for the word “catches” , for example, might look as follows 





~ cat \ ' 


% 












tense : 


present 






head : 


agr : 




number: singular 








person : third 


catches i — > 




- 




L 


j 




subject : 


head: 


agr: 


[D‘ 








- 






object : [ ] 









Features are listed in an attribute-value matrix (avm). Every word has a 
feature cat describing the syntactic category. “Catches” has a feature head 
that contains some relevant information about the verb form. Furthermore, 
there are features subject and object^ describing properties of the subject and 
direct object of the verb. The value of a feature can be some atomic symbol 
(as for cat); an avm (as for head and subject), or unspecified (as for object). 
Unspecified features are denoted by an empty AVM, also called a variable. The 
intended meaning, in this case, is that the verb catches does have a direct 
object, but its features do not matter. 

An important notion in avms is coreference (indicated by numbers con- 
tained in boxes). In the above example, the head agr feature is coreferenced 
with subject head agr, meaning that the agreement features of “catches” must 
be shared with the agreement features of its subject. Note, furthermore, that 
an entry within a nested structure of AVMs can be addressed by means of a 
feature path. 



^ If several different feature structures coexist for the same word, we will simply 
treat these as belonging to separate (homonym) words. Disjunction within fea- 
ture structures is discussed in Section 9.4. While (a limited form of) disjunction 
is very useful for practical purposes, one can always interpret feature structures 
with disjunction as a compact representation of a set of non-disjunctive feature 
structures. Hence, from a theoretical point of view, disallowing disjunction is no 
limitation to the power of the formalism. 
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A first, very simple lexicon for the remainder of our canonical example 
sentence “the cat catches a mouse” is as follows: 



the, a I — > [ cat : *det^ 





'cat : ' 


^n 




■ 


cat, mouse 1 — )■ 


head: 


agr: 


number: singular 
person : third 





In order to parse the sentence, we need productions that tell us what to 
do with the features when we construct constituents. The syntactic categories 
of constituents are expressed by means of features, just like all other char- 
acteristic information. A formal, but somewhat austere way to express the 
construction of an NP from *det and *n is the following: 

X2 

(Xo cat) = NP 

(Xi cat) = *det (7.1) 

(X2 cat) = *n 

(Xo head) = {X2 head). 

That is, if we have constituents Xi, X 2 with cat features *det and *n^ re- 
spectively, we may create a new constituent with cat feature NP. Moreover, 
the head of Xo is shared with the head of X 2 .^ 

In most, if not all grammars it will be the case that all constituents have 
a cat feature. Hence we can simplify the notation of production (7.1) to 

NP-^ *det *n . . 

{NP head) = {*n head). ^ ' 

The meaning of (7.1) and (7.2) is identical; the expression {Xi cat) = A can 
be deleted when we substitute an A for X{ in the production. Thus we obtain 
context-free productions as usual, enhanced with so-called constraints that 
describe how the feature structures of the different constituents are related to 
one another. Hence, for the noun phrase “the cat” we may construct a feature 
structure with category NP and the head feature taken from the noun “cat:” 





" cat : NP 




■ 


the cat 1 — > 


head: 


agr: 


number: singular 
person : third 





similarly for “a mouse.” For the construction of a TP, in the same vein, we 
employ the following production annotated with constraints: 

^ In Chapter 8 we will make a distinction between type identity (denoted =) and 
token identity (denoted =). As the distinction is not very relevant here, its intro- 
duction is postponed until Section 8.2, where we have developed the convenient 
terminology. 
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*v NP 

{VP head) = {*v head) 

{VP subject) = {*v subject) 

{*v object) = {NP) 

The verb phrase ‘‘catches a mouse” shares its head and subject features with 
the verb, while the entire (feature structure of the) NP is taken to be the 
direct object: 



cat 

head 



VP 

tense: present 

m 

agr : 



number: singular 
person : third 



catches 
a mouse 



subject : 


head : 


01' 






agr: J 



r cat : NP 



object : 



head : 



agr: 



number: singular 
person : third 



A sentence, finally, can be constructed from an NP and VP as follows: 

S-^NP VP 

{S head) = {VP head) 

{VP subject) = {NP) 

The sentence shares its head with the VP, The subject feature of the VP is 
shared with all features of the NP. Note that (by coreference) the subject 
of the verb phrase has {head) agreement third person singular. An NP can 
be substituted for the subject only if it has the same agreement. If the NP 
were to have a feature {head agr number) with value plural, then the S would 
obtain both singular and plural as values for its {head agr number) feature 
(because it is shared with the {subject head agr number) feature of the VP, 
which is shared with the {head agr number) feature of the VP). Such a clash 
of values would constitute an inconsistency, as discussed in Section 7.1. As a 
feature structure for S we obtain 



the cat catches a mouse i — > 



[cat : S 




- 




r tense : present 


"1 


head : 


agr : 


number: singular 






person : third 





The entire sentence appears to have less features than its constituing parts 
NP and VP. That is because some features were present only to guarantee 
agreement between subject and verb. As the sentence has been produced. 
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the 



a 



cat 



mouse 



catches 



■ cat : *det 



head : trans : [ det • +] j j 

'cat : *det 

head : trans : [ det : — ] j 



r cat 



head : 



agr 



number: singular 
person : third 



■ cat 



head : 



’ cat 



^pred: cat] 
n 

number : singular 
' person : third 

trans: [pred: mouse] 



head 



tense: present 

m 

agr : 



number: singular 
person : third 



trans : 



pred: catch 

0 

arpl: [] 

0 

an?2: ^ [ ] 



subject : 



object : 



head: 



head: 



agr 



trans : 



0 - 

0 



trans : 



0 ' 



Fig. 7.1. Part of the lexicon for UGi 
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S-^NP VP 

{S head) = ( VP head) 

{ VP subject) = {NP) 

VP-^ *v NP 

( VP head) = { *v head) 

{VP subject) = {*v subject) 
l*v object) = (NP) 

NP-¥*det *n 

{NP head) = {*n head) Fig. 7.2. Some productions of 

{*n head trans) = {*det head trans) UG\ 



the agreement must have been okay, hence there is no need to retain this 
information explicitly in the feature structure for an 5. 

Above we have shown how syntactic constraints can be incorporated into 
the features of a grammar. We will also give an example of how semantic 
information can be collected from the lexicon and transferred upv^ards to 
contribute to the semantics of the sentence. We will use a very simple unifi- 
cation grammar C/Gi . A relevant part of the lexicon for U G\ is shown in Fig- 
ure 7.1, the productions annotated with constraints are shown in Figure 7.2 
The head of each feature structure is extended with a feature trans {lation), 
which is only a first, easy step towards translation of the constituent to its 
corresponding semantics. The translation of a verb is a predicate with the 
(translation of the) subject as first argument and the (translation of the) 
object as second argument. 

The production NP-^*det *n has been extended with another clause, 
stating that the head trans features of *det and *n are to be shared. Thus 
we obtain, for example 

\-cat : NP 

number: singular 
person : third 

pred: mouse 
det : — 

Because the translation of the subject and object are used as arguments for 
the translation of the verb, the relevant properties of subject and object are 
moved upward to a feature structure for the entire sentence. The reader may 
verify that, following the same steps as before, we obtain 



head . 



agr : 
trans : 
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the cat 

catches a mouse t — > 



“ cat : 


5 

~ tense: p\ 




agr : 


head : 


trans : 




L 



number: singular 
person : third 

pred : catch 

pred: cat 



argl: 

arg2: 



det 



pred: mouse 
det : — 



Other features can be added likewise. We can add a modifier feature to 
the translation, in which modifiers like adjectives, adverbs and prepositional 
phrases can be stored. For a noun phrase “the very big, blue cat” we could 
envisage a feature structure as in Figure 7.3. 

A noun phrase can include any number of modifiers, hence these are 
stored by means of a list. More sophisticated feature structure formalisms 
as, e.g., HPSG [Pollard and Sag, 1988], have special constructs for lists. Such 
constructs are convenient for notation, but not necessary. As shown in Figure 
7.3, lists can be expressed in the basic formalism as well. In Section 9.5 a more 
complicated example is shown where lists are used for subcategorization of 
verbs. 



r cat : NP 




P 




agr : 


head: 






trans : 







number: singular 
person : third 

pred: cat 
det : H“ 



mod: 



first : 



rest : 



trans: big 



mod : 



first : 



trans : 
mod : 
rest : no 



trans: blue 
mod : no 
rest : no 



first: 



very 

no 



Fig. 7.3. Feature structure of “the very big, blue cat’ 
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The last one and a half decade has witnessed an overwhelming number of 
different, but related unification grammar formalisms. Our informal intro- 
duction in Chapter 7 was based on PATR [Shieber, 1986], which is the small- 
est and simplest of these formalisms. Unlike formalisms as LFG [Kaplan and 
Bresnan, 1982], GPSG [Gazdar et ah, 1985] or HPSG [Pollard and Sag, 1987, 
1994], PATR was not primarily designed to capture some universal linguistic 
structure, but merely as a small, clean formalism that covers the essential 
properties found in most other unification grammars. 

The logical foundations of constraint-based formalisms have been dis- 
cussed by Kaspar and Rounds [1986], Smolka [1989, 1992] and Johnson [1991], 
who give various axiomatizations of feature structures in predicate logic. In 
such a logical approach, one describes a constraint language in which con- 
straints can be expressed. Such constraints are formulae in first-order logic 
with equality. Constraints state that certain features must have certain values 
or be equal to certain other features. The semantic interpretation of such a 
formula (following Smolka) is a feature graph. The most interesting property 
is satisfiability . For a given formula it has to be decided whether a feature 
graph exists that is a model of the constraint. 

A more fundamental treatment is given by Shieber [1992], who starts with 
the logical requirements for unification-based grammars and then sets out to 
investigate which models would be appropriate. 

Our purpose, in this chapter and the next, is a rather different one. We 
will investigate how, for a given class of unification grammars, efficient parsers 
can be developed, by means of parsing schemata. Just like in the context- 
free case, we will be concerned with the question which items one likes to 
derive and which rules should be used for that. In addition, we extend the 
formalism with a notation that allows explicit specification of transfer of 
features between items. 

Parsing of unification grammars is a combination of two problem areas, 
both of which are complex in itself. Parsing is our primary interest, and the 
linguistic and logical properties of unification grammars secondary. Hence we 
do not worry about how to specify suitable unification grammars for natural 
languages, nor are we particularly concerned with the logical properties of 
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various unification grammar formalisms, but we assume a simple kind of 
unification grammar and address the question how efficient parsers can be 
defined. 

In order to be precise we will give a detailed, formal account of our simple 
formalism, that establishes thoroughly what we have presented informally in 
Chapter 7. The results are virtually equal to those of Smolka and others, but 
we employ a rather more computational view and do not pretend to give a 
general treatise on unification grammars. 

We do not make a distinction between syntax (constraints) and seman- 
tics (feature graphs); we see both domains as syntactic domains. The notion 
of satisfiability is replaced by consistency. There is a simple isonniorphism 
between consistent constraints^ and well-formed feature graphs. Thus we ob- 
tain an abstract notion of a feature structure that may materialize in two 
different avatars: either as a graph or as a constraint. We switch represen- 
tation opportunistically to the domain that is most convenient at any given 
moment. For the purpose of (statically) describing a grammar, the constraint 
representation is the most useful. But the dynamics of a grammar, describing 
how a parse is to be obtained by unification of feature structures, are easiest 
understood in the feature graph domain. 

Feature structures, both as graphs and constraint sets, are introduced 
in 8.1. For both representations we define a lattice and prove these to be 
isomorphic in 8.2. For a proper formalization of how features of different 
objects may relate to one another, we introduce composite feature structures 
in 8.3 and define lattices in 8.4. This formalism is used to define unification 
grammars in 8.5. Tree composition in Primordial Soup fcishion is discussed 
in 8.6 and parsing schemata, finally, are defined in 8.7. 

In 8.8, at last, we give another example. The canonical example sentence 
is parsed with grammar UGi (cf. Section 7.2) using an Earley-type parsing 
schema (cf. Section 7.1). An overview of other grammar formalisms is pre- 
sented in 8.9, related approaches are briefly discussed in 8.10, and conclusions 
are summarized in 8.11. 



8.1 Feature structures 

We will give two different formalizations of feature structures, as constraint 
sets and feature graphs, and prove these to be isomorphic. The attribute- value 
matrix (avm) notation will be used as a convenient, informal notation to 

^ Prom Section 8.1 onwards, we will call these constraint sets. A constraint as a 
formula in first order logic with equality can be seen as a conjunction of a series 
of atomic constraints. For our purposes it will be more convenient to describe 
this as a set of atomic constraints, rather than a conjunction. 




8.1 Feature structures 



139 



denote feature structures. The correspondence between AVMs, feature graphs 
and constraint sets is straightforward. In Figure 8.1 an AVM is shown with 
corresponding constraint set and feature graph. 

In Figure 8.1(a)-(c) it is exemplified how the information contained in an 
AVM can be encoded in a graph. The features are represented by edges; the 
atomic values are represented by labels of terminal vertices. Internal vertices 
carry no label; their value is the feature structure represented by the outgoing 
edges. The root vertex can be labelled with an identifier for the object whose 
features are represented here. 

In order to give a formal definition of the domain of feature graphs, we 
first introduce some auxiliary domains from which features and values can 
be drawn. 

Definition 8.1. {features, constants) 

Tea denotes a finite set of features. We write f,g,h,. . . for elements of Tea. 
Const denotes a finite set of constants. We write c, d, e, . . . for elements of 
Const 

It is assumed that Tea and Const are disjunct sets. Furthermore, we assume 
that a linear order has been defined on both sets Tea and Const 
In the sequel we will also need sequences of features. We write tt, ^ for elements 
of Tea* . A linear order on Tea* is defined by the “lexicographic order” based 
on the linear order of Tea: 

(i) 7T < TTQ for non-empty feature sequences g; 

(n) TTfg < ngg' if f < g. 

This linear order on feature sequences will be used to define a suitable normal 
form for constraint sets. □ 

We recall some useful notions from graph theory and introduce appropri- 
ate notations. 

Definition 8.2. (dags) 

A directed graph is a pair F = (U,E), with U a set of vertices^ and E a set 
of edges. An edge is a directed pair {u,v) with u,v £ U. Usually we write 
u-^v for (u, v) £ E. 

A (possibly empty), finite sequence of edges uq-^ui,ui-^U 2 ^ • . . , Uk-i~^Uk is 
called a path. We write u — v v for a path from u to v. 

A directed graph is called cyclic if there is a non-empty path u — > u for 
some vertex u £ U. A graph is acyclic if it is not cyclic. We write DAG as 
abbreviation for a directed acyclic graph. 

A root of a graph is a vertex u such that for all v £ U there is a path from u 
to V. 

A DAG is called rooted if it has exactly one root. 

^ We write U rather than V for the set of vertices, because V denotes the grammar 
variables NU E. 
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" cat : 














tense: 


present 






head : 


agr : 


0 


number: singular 








person : third 










L 


J 




subject : 


head : 


agr: 








object : [ 


:] 









(a) an attribute value matrix 



{ {X cat) = 

{X head tense) = present, 

{X head agr number) = singular, 

{X head agr person) = third, 

{X subject head agr ) = {X head agr), 

{X object) = [ ] } 

(b) a constraint set 



X 




(c) a feature graph 



Fig. 8.1. Three different representations of the same feature structure 
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An edge u-^v is an outgoing edge of u and and incoming edge of v. 

A leaf is a vertex with no outgoing edges. □ 

Definition 8.3. {feature graphs) 

TQ is the class of finite, rooted dags with the following properties: 

(^) every edge is labelled with a feature; 

{ii) if / and g are labels of edges originating from the same vertex, then 

f ^ 

{Hi) leaves may be (but need not be) labelled with a constant; 
non-leaf vertices do not carry a label. 

We write u ^ vii u^v is labelled /; we write u v if the sequence of steps 
from u to u is labelled with a sequence of features tt. We write label {u) = c 
if u is labelled with constant c and label{u) = e if u carries no label. 

We write F{X) for a feature graph that denotes the features of some (here 
unspecified) object X. □ 



An example of a constraint set was shown in Figure 8.1(b). In the defini- 
tion of a constraint set, we have included a parameter X that can be used to 
identify an object for which constraints are to be specified. We will not use 
this parameter for a while, but include it here in anticipation of composite 
constraint sets that will be defined in Section 8.3. 

Definition 8.4. {constraint set) 

Let X be a (not further specified) object. Constraints on X can be drawn 
from different domains: 

• The domain of value constraints VC is defined by 

VC = {(Xtt) = c I 7t E Tea^ A c E Const]] 

• The domain of coreference constraints CC is defined by 

CC = {{Xtt)^{Xq) I 7T,^E.Fea*}. 

A constraint set xi^) is a finite subset of VCU CC. 

As a convenient notation^ we write (Xtt) = [ ] for contraints of the form 
(Xtt) - (Xtt). 

As an ad-hoc general notation we write (Xtt) = /x for a constraint, where /x 
can be of the form c, [ ], or (X^). □ 

^ Alternatively we could introduce a separate domain of existential constraints of 
the form (Xtt) = [ ], with [ ] a symbol not in Xea and Const and omit constraints 
of the form (Xtt) = (Xtt) from the coreference constraints. To state that ‘X has 
an object” is more to the point than to state that “the object of X is coincident 
with the object of X,” but the logical implications are the same. By seeing the 
former as a notational variant for the latter, we can combine the more intuitive 
notation with the simpler formal model. 
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Next we define the closure of a constraint set. It is closed in the sense that 
every single piece of information that can be drawn from the given constraints 
is added explicitly as a separate constraint. 

Definition 8.5. {closure of a constraint set) 

Let x(^) ^ VCU CC be a constraint set. The closure of x(^)^ denoted 
closure{x{X)), is the smallest set satisfying^ 

(0 = [ ] € closure(x{X)); 

(ii) if (Xtt) = /i G x(^) then {Xtt) = /x € closure{x{X)); 

(Hi) if (Xtt) = {Xtt') € closure{x{X)) and {Xng) = ij. £ closure{x{X)) 
then (Xtt'p) = /x G closure{x{X)); 

{iv) if (Xtt) = \Xq) G closure{x{X)) then {Xg) ~ (Xtt) G closure{x{X))- 
{v) if (Xttp) = /X G closure{x{X)) then (Xtt) = [ ] G closure{x{X)). 

A constraint set x{^) is called closed if closure{x{X)) — x{^)- Q 

Note that closure{x{X)) need not be a constraint set according to Definition 
8.4: it could be an infinite set. If, for example, (Xtt) = {Xirg) G x(X) then, 
by {ii) we obtain {Xirg) = {X'Kgg) G {Xt^QQ) = {XTrggg) G x(^), 

and so forth. 

The closure of the constraint set in Figure 8.1(b) is shown in Figure 8.2. 
The purpose of the “existential” constraints added in {iv) is to identify the 
existence of all substructures. We will use them for the transformation of a 
constraint set into a graph. 

Definition 8.6. {consistency) 

A closed constraint set x(X) is called consistent if it satisfies the following 
properties: 

{i) if (Xtt) = c G x(^) i^'^) = d G x(^) then c = d; 

{ii) if (Xtt) = c £ x(^) {Xng) = /x G xi^) l^hen g — e\ 

{Hi) (Xirg) = {Xtt) is not in x(X) for any tt and non-empty g. 

An arbitrary constraint set x(^) Is called consistent if closure{x{X)) is con- 
sistent. 

We write CCS for the set of consistent constraint sets. □ 

Corollary 8.7. 

If x(^) ^ then closure{x{X)) £ CCS, □ 

Hence closure defines an equivalence relation on constraint sets: if closurexi{X)) 
= closurex 2 {X)) then xi ai^d X 2 contain the same information. A particular 
representative of an equivalence class, a constraint set in normal form will be 
introduced in Definition 8.12. 

^ For non-empty constraint sets, {i) is implied by (i;). The empty set is closed by 
adding the empty constraint. (A) = [ ] does not convey any information about 
X. This is usually taken as the bottom of a feature lattice, cf. Definition 8.14. 
Note that transitivity (i.e., if {Xtt) = (Xtt'), {Xtt') = {Xtt") in closure{x(X)) 
then also (Xtt) = {Xtt")) is a consequence of (m) and {iv). 
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{ W = []> 

{X cat) = [ ], 

{X head) = [ ], 

{X head tense) == [ ], 

{X head agr) = [ ], 

{X head agr number) = [ ], 

{X head agr person) = [ ], 

{X subject) = [ ], 

{X subject head) = [ ], 

{X subject head agr) = [ ], 

{X subject head agr number) = [ ], 

{X subject head agr person) == [ ], 

{X object) = [ ], 

{X cat) = *v, 

{X head tense) = present^ 

(X head agr number) = singular, 

{X head agr person) = third, 

{X subject head agr number) = singular, 

{X subject head agr person) = third, 

(X head agr ) = {X subject head agr), Fig. 8.2. Closure of the con- 

{X subject head agr ) = {X head agr) } straint set in Figure 8.1(b) 



Definition 8.8. {mapping constraint sets to feature graphs) 

For each consistent constraint set x(X) G CCS we define a graph, as fol- 
lows. Vertices correspond to sets of left-hand sides of constraints. These sets, 
denoted [(Xtt}], are defined by 

[(Xtt)] - {{Xtt)} U {{Xg) I (Xtt) - (Xg) e closure {x{X))}. 

The graph r{X) = graph{x{X)) is defined by 

u = {[{Xtt)] I {X7t) = [] & closureixiX))}, 

E = {[(Xtt)] 4 [(Xtt/)] I {Xtt/) = [ ] € closureixiX))}. 

The label of a vertex [{Xtt)] is defined by 

i c if (Xtt) = c G closure{x(X)) 

□ 

e otherwise 

Lemma 8.9. 

For each x(^) ^ holds that graph{x(X)) G TQ. 

Proof. Direct from the following observations: 

• if [(Xirf)] = [ ] G closure{x{X)) then also [(Xtt)] = [ ] G closure{x{X)), 
hence E is properly defined with respect to U ; 

• if [(Xtt)] ^ u and [(Xtt)] -4 v then u = v; 

• the graph has a root [(X)]; 

• there are no (Xtt) = c and (Xtt) = d with c ^ d, hence each label is 
uniquely defined; 
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• moreover, if {Xttq) = (jl e closure{x{X)) for non-empty g then the con- 
sistency of x(^)) guarantees that there is no (Xtt) = c E closure ( x(X)), 
hence label {[{Xn)]) = e. □ 

Definition 8.10. {mapping features graphs to constraint sets) 

For each feature graph r{X) E TQ we define a constraint set. To that end, 
we label each vertex with an auxiliary path label. If there are several paths 
to a vertex, we take the lowest one in lexicographical order. Formally: let r 
be the root of r{X), then 

pathJabel{u) = min{^ | r u}. 

A constraint set constraints {r{X)) is (uniquely) defined by 
Xv{X) = {{XpathJabel{u)) = c | label{u)=c}, 

Xe{X) = {{X path-label {u)) = [] | u is a leaf A label {u) = e}, 

Xc{X) = {{X path-label {u)) = (Xg) \ r u A g ^ path-label {u)}, 
X{X) = xv{X)Uxe{X)Uxc{X). □ 

Lemma 8.11. 

For each graph r{X) E TG it holds that constraints {r{X)) E CCS. 

Proof Let r{X) E TQ. We verify the constraints for consistency of Definition 
8.6. {i) follows from the definition of xv{X); (ii) because in F{X) only leaves 
are labelled; (m) because the graph is acyclic. □ 

Definition 8.12. (normal form) 

The function nf : CCS — > CCS is defined by 

nf(x{X)) — constraints{graph(x{X)). 

nf{x(X)) can be thought of as the normal form of a constraint set. It is, 
roughly speaking, a constraint set with constraints that are minimal in lexi- 
cographical order. We write nfCCS for the set of constraint sets that satisfy 
nfixiX)) = x(^). □ 

In order to compute a normal form, it is not necessary to construct a graph 
and then afterward deconstruct it. An algorithm to obtain the normal form 
of a constraint set is shown in Figure 8.3. It is left to the reader to verify the 
correctness of this algorithm; our main concern right now is the existence of 
the normal form, rather than its computation. 

Lemma 8.13. 

When we restrict graph to constraints set in normal form, the functions 

graph : nfCCS — > TQ 

constraints : TQ — > nfCCS 

are bijections. Moreover, they are each other’s inverse. 

Proof: straightforward. □ 
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procedure normalize x{^) 

begin 

repeat each of the following steps 

replace (Xn) = (Xg) by (Xg) = (Xn) 

if p < 7t; 

replace (Xng) = /i by (Xtt' g) = // 
if 7 t' < 7T and (Xtt) = (Xtt') E x(^); 

delete {Xirg) = (Xn'g) from x(^) 
if {Xtt) = {Xn} E x{^) ^i-nd g ^ e; 

delete {Xn} = [ ] from xi^) 
if {Xng} = fi E x{^) f^or some g ^ e 
or if {Xn} = c 

until no more of these steps can be applied 

end; 



Fig. 8.3. A sim- 
ple normalization pro- 
cedure for constraint 
sets 



8.2 Feature lattices 

We will now define a lattice structure for constraint sets and feature graphs. 
First, we recall the definition of a lattice. 

Definition 8.14. {lattice} 

Let X be an arbitrary set (with elements x, t/, . . .) and C a partial order on 
X. The pair {X,\l) is called a lattice if 

(i) There is a top element T E X and a bottom element B E X such that 
B Qx QT ior each x E X, 

{ii) For each pair of elements x,y E X there is a lowest upper bound {lub), 
denoted xUy, that satisfies 

(a) x n xUy and y Q xUy; 

(b) for each z such that x Q z and y Q z it holds that xUy Q z. 

(Hi) For each pair of elements x^y E X there is a greatest lower bound (gib), 

denoted x □ y, that satisfies 

(a) X n 2/ C X and xHy 

(b) for each z such that z C x and z ^y it holds that z Q xHy. □ 

Definition 8.15. (nfCCS^, TQ^) 

We define a set Lccs by 

Xccs^ VCt) CC. 

(This is not a constraint set according to Definition 8.4, as Xccs is not finite) 
We define a graph Lfg — (Ui.,E±) by 
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E±_ = {r -U r 1 / E Tea}. 

(This is not a feature graph according to Definition 8.4, as ±fg is not a DAG. 
The vertex r can be thought of as labelled with all constants at once.) 
Furthermore, we extend graph and constraints by 

graph{±ccs) = -^fg^ 

constraints (±fg) = ^ccs • 

We extend the domains of constraint sets and feature graphs by 
nfCCS^ = nfCCSU{±ccs}, 

= TGU{±fg}- 

When it is clear from the context which domain is meant, we drop the index 
and simply write ± for inconsistent. □ 

Definition 8.16. {subsumption)^ 

A subsumption relation C is defined on nfCCS^ by 

Xi W E X 2 {X) if closure{xi{X)) C closure{x 2 {X)). 

A subsumption relation C is defined on TQ^ by 
A(A’) Q T 2 (X) if constraints{ri{X)) C constraints {r 2 {X)). □ 

Note that x(A) C ± for any x(^)- If happens to be the case that _L is the 
top element of the lattice structure over constraint sets. This is somewhat 
unfortunate, because in lattice theory ± usually denotes the bottom element. 
On the other hand, it is not uncommon to interpret J_ as “inconsistent” . This 
notational problem can be solved, simply by reversing the lattice structure. 
If we write □ and n, rather than C and U, we have ± as the bottom of the 
lattice. This is equally problematic, however, as it is not intuitively appealing 
to write n for a symbol that is to be interpreted as a union of constraints. 
Hence we stick to the notation as introduced in Definition 8.16. 

Theorem 8.17. {lattice structure) 

{a) {nfCCS^ , C) is a lattice with bottom {{X) = [ ]} and top J-cas- 
{b) {TQ^ , □) is a lattice with bottom graph{{{X) = [ ]}) and top -Lfg- 
(c) graph : nfCCS^ — > TQ^ is an isomorphism with respect to □; 
constraints : TQ^ — > nfCCS^ is the inverse isomorphism. 

^ There are many different ways in which subsumption is defined in the literature 
(e.g. (pi C (p 2 if there is a homomorphism from cpi to cp 2 ). The general idea is 
that (pi C (p 2 if the information (in whatever form) contained in cpi is a subset 
of the information contained in (p 2 - We have defined the closure as an auxiliary 
construct to gather all information implied by a constraint set as separate facts. 
Hence, for closed sets, subsuption equals set inclusion. 
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Proof. 

(а) The top and bottom properties are trivial. 

The existence of a luh for any two constraint sets Xi{^)^X 2 {^) € 
nfCCS^ is shown as follows. We write x' for closure{xi\^) {Jx 2 [X)). 

If x’ is inconsistent, then ± is obviously the luh. 

Otherwise, assume x" ^ CCS with XiW E x" and X 2 {X) C x"* 

Then closure{xi{X)) C closure{x"), and closure {x 2 {X)) C closure(x^'). 
Hence x’ Q closure{x"), and nf \x') is the least upper bound in nfCCS . 
The existence of a gib follows in similar fashion. 

(c) Straight from Lemma 8.13 and Definition 8.16. 

(б) Direct from (a) and (c). □ 

Corollary 8 . 18 . 

For any pair of consistent constraint sets in normal form xi{X),X 2 {X) e 
nfCCS it holds that 

XI{X)UX2{X) - n/(xi(X)Ux2(X)) □ 

We have defined U as a least upper bound, derived from the subsumption 
relation □. In practical applications, we see U as an operator that allows to 
construct new feature structures by merging the features of existing feature 
structures. How such a merge is carried out in an efficient manner is not a 
direct concern here. We will come back to that issue in Chapter 9. 

Having proven that normal forms of consistent constraint sets and feature 
graphs are isomorphic, we can abstract from the particular representation and 
simply call it a, feature structure. We write (p{X) to denote a feature structure, 
or simply ip if it is not relevant which object X is characterized by the features 
imp. A feature structure will be interpreted in an opportunistic manner either 
as feature graph or as constraint set, whatever is most convenient. 

We write (p{X).7r to denote the substructure of ip{X) that is (in the graph 
representation!) the largest subgraph of which [(Xtt)] is the root. We write 
p{X).'K = c if (in constraint set representation!) {Xt:) = cE closure{ip{X)). 

As an informal notation for feature structures we write AVMs, feature 
graphs or constraint sets. It is not required that a constraint set be in normal 
form. Normal forms were important because the lattice structure is defined on 
normal forms, but for any practical application any equivalent specification 
of a constraint set will do as well. 



At last we can now explain the difference between type identity and token 
identity. Consider the following feature structures: 



/: 

9 '^ 



’/: c’ 




g: d 




\f--c] 









^2 



m 

m 



/: c' 
g: d 



<^i = 
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Then the substructures ipi.f and (pi.g are called type identical: they have the 
same value, but they are different structures. The substructures and 
if 2 .g are called token identical: they refer to a single structure (and have the 
same value a fortiori). Note that (fi Q 2 , because the constraint set of (f 2 
can be obtained from the constraint set of by adding a constraint (viz. 
{Xf) = {Xg)). The difference between these structures comes to light when 

either structure is unified with = f p : [h: e] 1 , yielding 






f: 


' f: c 






□ 


-f: c 




g: d\ 






/: 


9 : d 




■/: c' 




C (P2\j(p' = 




h\ e 


9- 


g: d 






□ 






h: e 






19- 





In the sequel, we write the usual equality symbol (=) for type identity and a 
dotted equality symbol (=) to denote token identity. So we have (/?i ./ = ^i>g, 
= ^ 2 - 9 , ^ 2 ./ = ^ 2 - 9 , but ifij # ifi. 9 . 

The difference between type identity and token identity is only relevant 
for substructures. For constants it doesn’t make any difference whether a 
value is token identical to or a copy of some given other constant. 



8.3 Composite feature structures 

So far we have defined feature structures, that capture the characteristic 
properties of some object. It is essential, however, to add the conceptual 
machinery that allows us to relate the features of different objects to one 
another. To this end we introduce feature structures that describe the features 
of a (finite) set of objects. Features can be shared between objects by means 
of token identity. 

Composite constraint sets for sets of objects are only a minimal extension 
of the constraint sets of Section 8.1: coreferencing is allowed between (features 
of) different objects. In the domain of feature graphs, we get a set of graphs 
that may share subgraphs. Or, to put it differently, we get a single graph 
with multiple roots. 

Definition 8.19. {multi-rooted feature graphs) 

A multi-rooted feature graph is a structure T(Xi, . . . , Xk) = {U, E, R) with 
(t/, E) a finite DAG and R = {ri, . . . , rfc} C [/, with the following properties: 

(z) every edge is labelled with a feature; 

(ii) if / and g are labels of edges originating from the same vertex, then 
f ¥^9', 
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{in) leaves may be (but need not be) labelled with a constant, non-leaf ver- 
tices do not carry a constant label; 

{iv) For every u £ U there is some r ^ R such that r — > u. 

We call R the root set of the graph. The size of the root set must correspond 
to the number of formal parameters the roots can be labelled 

with identifiers referring to the objects whose features are represented.^ 

We write MTQ for the class of multi-rooted feature graphs. □ 

Definition 8.20. {composite constraint sets, closure) 

Let Xi,. . . ,Xk denote a finite set of objects. A (composite) constraint set 
x(A’i , . . . ,Xk) is a finite set of constraints from the domains of value con- 
straints, existential constraints and composite coreference constraints, defined 
as follows: 

VC = {{Xiir) = c| l<z<A:A7rE Tea* A c € Const}, 

CCC — {{Xiix) = {Xjp) I l<z<A: A I < j <k A tt, Tea*], 

Again (see Footnote 3) we write {Xitt) = [ ] as a more convenient notation 
for {Xin) = {Xiir). □ 

Definition 8.21. {closure of a composite constraint set) 

Let x{^) C VCyj CC be a constraint set. The closure of x{^)^ denoted 
closure{x{X)), is the smallest set satisfying 

{i) (Xi) = [ ] C closure{x{Xi , . . . ,Xk)) for I <i <k', 

{ii) ii{XiiT)^pLex{Xi,,.,,Xk) 

then = p, £ closure{x{Xi , . . . , X^)); 

{Hi) if {XiTv) = {XjTr'), {Xiirg) = /i G closure {x{X i , . . . ,X/t)) 
then {Xjtt'q) = p £ closure{x{Xi , . . . ,Xfc)); 

{iv) if (XiTr) = {XjQ) G closure{x{Xi, . , ,,Xk)) 
then {Xjq) = (XiTr) G closure{x{Xi, . . . , Xk))] 

{v) if (XiTTp) = p G closure{x{Xi , . . . , X^)) 
then (XiTr) = [ ] C closure{x{Xi, . . . ,X^)). 

A constraint set x(Xi, . . . , X/^) is called closed if closure{x{Xi, . . . ,Xk)) = 
x{Xi,.,,,Xk), O 



^ Note that is it not required that a root Vi has no incoming edges. It is conceivable 
that one root is the descendant of another root (and also that several roots 
coincide). In that case, the features of one object are token identical with a 
substructure of the features of another object. 
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Definition 8.22. {consistency) 

A closed composite constraint set is called consistent if it 

satisfies the following properties: 



(0 


if {Xii:)=cex{Xi,.. 
then c = d] 


. . , Xk) and {XiTr) = dex{Xi,... 


,Xk) 


{ii) 


if {Xi7:)=cexiXi,.. 
then g = e; 


.,Xk) and {XiTTQ) = fi E 


..,Xk) 


{Hi) 


{Xiirg) = {Xiir) and (XjTr) = {Xiirg) are not in x{Xi,- 
for any i tt , and non-empty g. 


■ .,Xk) 



An arbitrary composite constraint set is consistent if 

closure{x{Xi^ . . . , Xk)) is consistent. 

We write CCCS for the set of consistent composite constraint sets. □ 

Definition 8.23. {mappings^ normal form) 

The mappings graph and constraints can be extended to composite constraint 
sets and multi-rooted feature graphs in the obvious way (and it can be verified 
straightforwardly that these functions are well-defined). 

The function nf : CCCS — > CCCS is defined by 

nf{x{Xi,. . .,Xk)) = constraints{graph{x{Xi,. . .,Xk))\ 

We write nfCCCS for the set of constraint sets that satisfy 

nf{x{Xx,...,Xk))=x(Xu---,Xk). □ 



Definition 8.24. {substructures) 

Let r(Xi, . . . , Xfc) = (C/, E,{ri,...,rfc}) E MTQ describe the feeitures of 
a set of fc objects. The feature graphs of a subset of this set of objects are 
described by a subgraph, as follows. 

Let {Xi,,..,,Xi^}c{Xu..,,Xk}. 

Then r{Xi ^ , . . . , Xi^ ) = ([/', E', {n ^ , . . . , }) is defined by 

U' = {u E U \ ri- — > u for some j (1 < j < m)}, 

E' = {u-^v e E 1 E U'}. 

Similarly, a substructure is defined for closed constraint sets®. 

Let x{Xi ^ . . . ,Xk) be a closed constraint set. A (closed) substructure 
x{Xi^ , . . . , Xi^) for {Xi, , . . . , } C {Xi, . . . , Xk} is defined by 



® We cannot simply apply the same definition to arbitrary constraint sets: if a fea- 
ture of some Xi- is token identical with an object that is no longer represented in 
the substructure, all constraints relating to that part of the deleted substructure 
must be taken into account as well. Only in closed constraint sets it is guaranteed 
that every feature of an object is completely described by constraints for that 
object. 
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= {{Xi^7r)=cex{Xu^...X,) \ I < j < m} U 

{{Xi^7r) = []ex(Xu^.-,Xk) I l<J<m}U 

{{Xi^7r) = {Xi,Q)ex{Xu...^X,) 

I l<i<^Al</< m}. 

For x{Xi,...,Xk) e nfCCCS and , . . . , } C {Xi,...,Xfc} we define 

a substructure x{Xi ^ , . . . , Xi^) as follows. 

Let x\Xi,..,,Xk) = closure{x{Xi,..,,Xk)); 

then x{Xi , , . . . , = nfix'iXi , , . . . , X,^)). □ 

Definition 8.25. {composite feature lattices) 

We define a set Lcccs by 

■^cccs— F(7U CCC. 

As inconsistent MTQ we define a multi-rooted graph ^mfg— ^ 
with an infinite root set: 

= Rl. = {^l5 • • 

E± = {vi A rj I ri,rj e R± A f e J^ea}. 

Each vertex n can be thought of as being labelled with all constants at once. 
The functions graph and constraints are extended to map Lcccs and 1-mfg 
onto each other. 

We define the domains 

nfCCCS^ = nfCCCSu{±cccs}, 

M.TQ^ - AiJ-Q U {I-mfo}- D 



8.4 Composite feature lattices 

Before we define subsumption on composite feature structures, we must clar- 
ify the distinction between objects and formal parameters. It is our purpose 
to derive a binary operator U that can be used to unify feature structures. A 
feature structure (p{Xi , . . . , X^) U (p{Yi ,Yi) combines the features of both 
structures. It is important to know, however, which X’s and which F’s refer 
to identical objects. Let, for example, X3 = Y2 and all other X{ and Yj be 
different. Then in the unified feature structure (p{Xi,. . . , X^) U . . . , T/) 
there is (a parameter for) an object that will contain both the features of 
(p{Xs) and </?(F2). (Note, however, that ^{Xs) and (f{Y2) are separate fea- 
ture structures. Features can be shared across objects (or parameters) within 
a single composite feature structure, but features can not be shared across 
different composite feature structures.) Hence it is essential to know which 
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parameters denote which objects, so that the right pairs of features are uni- 
fied when we unify two composite feature structures. Therefore we assume 
the existence of a (possibly infinite but countable) domain of objects and 
postulate that each parameter refers to an object. 

In a practical notation, we could annotate the unification with which 
parameters should be considered to refer to the same object. The above case 
can be denoted as 

vp(Xi, . . . , Xfc ) uxs=y2 • • • 5 yi)‘ 

As indices to the unification we write (sequences) of equalities that denote 
correspondence between formal parameters of either argument. In the unlikely 
case that all formal parameters are different we could write U 0 (but this 
operation will not be used in the sequel). Hence, when we write an unqualified 
lub symbol U it should be clear from the context which parameters of both 
arguments refer to the same object. This will usually be the case. 

In practical use, we see U as an operator that can be used to construct 
new feature structures from existing feature structures. But before we start 
using it, we have to define U formally as a least upper bound in a lattice. 

Definition 8.26. {subsumption) 

A subsumption relation C is defined on nfCCCS^ as follows:^ 

Xi(Xi,...,Xfc)Cx 2 (>"i,...,>/) holds if 

(0 {Xi,...,Xfc}C{Fi,...,ya, and 

(u) closure{xi{Xi,...,Xk)) C dosure{x 2 {Xi, , . . ,Xk))- 

A subsumption relation C is defined on by 

A(Xi,...,Xfc) cr2(yi,...,yo holds if 

constraints {ri{X I , . . . ,Xk)) E constraints {r 2 {Yi ^ . . . , y)). □ 

Theorem 8.27. {lattice structure) 

The following statement hold: 

(a) {nfCCCS^, □) is a lattice with the empty set^^ as bottom and top l-cccs- 
{b) {MXQ^ ^ C) is a lattice with the empty graph as bottom and top -Lmfg- 
(c) graph : nfCCCS^ — > MTQ^ is an isomorphism with respect to □; 
constraints : MX — > nfCCCS^ is the inverse isomorphism. 

Proof: straightforward extension of the proof of Theorem 8.17 and preceding 
lemmata. □ 

^ In general, it will be clear how the formal parameters of the left opereind of C 
correspond to formal parameters of the right operand (typically, k = I and Xi 
corresponds to Yi). In cases where it is not obvious (but those will not appear 
in this book) one could annotate C with correspondences, similar to U discussed 
above. 

The bottom element is to be interpreted as a constraint set for no objects (as 
opposed to the empty constraint for a given object in Theorem 8.17). 
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Corollary 8 . 28 . 

For consistent composite constraint sets in normal form 
Xi{Xi ,. . . ,Xfc),X 2 (^i, • • • 7^0 ^ nfCCCS it holds that 

. . ,Xk) UXi^=Yj^,...,X^^=Yj^ X2(^i, • • • = 

n/(xi(Xi,...,X,) U X2(n,...,y^) 

u {(X.,) = (y, (X.„) = (y,„.)}), □ 

As with constraint sets and feature graphs, we will blur the distinction 
between composite constraint sets and multi-rooted feature graphs. We sim- 
ply write (p{Xi , . . . , X/t) to denote a composite feature structure for k objects. 
As in 8.1 we write ^ to denote both lattices {nfCCCS^, C) and {MTQ^ , C). 
If we need one particular representation we will pick the one that is easiest 
to work with, depending on the circumstances. 

From a composite feature structure ^{Xi .... ,Xk) one can derive a feature 
structure ^{Xi) for any object, by taking the appropriate substructure. As a 
convenient notation we write 

^{Xi) = ^{Xu...,Xk)\x^ 

to denote that a feature structure for an object X{ is obtained by retrieving 
it from some composite structure. 

Up to now we have only attributed features to sets of objects. It is possible 
that the objects themselves are contained in a structure of some kind. We call 
these object structures so as avoid confusion with feature structures. Typical 
object structures that we will use in the remainder of this chapter are 

• A production A-^a from a context-free grammar. We write (p(A->a) as 
a convenient notation for a composite feature structure ip{A,Xi, . . . ,Xk) 
that describes features of left-hand and right-hand side symbols, where 

Q X\ , . . . , X k • 

• A tree {A ^ a). We write ^{{A ^ a)) as a convenient notation for a 
composite feature structure ... , Xi , . . . , X^), where a = Xi , . . . , Xk- 

• An item [A a]. Items were introduced in Chapter 4 as sets of trees. Here 
we should see them as abstractions of trees: We only know the root and the 
yield of the item; we do not know (or do not want to know) the internal 
nodes. Consequently, features can be retrieved only from the nodes that 
are explicitly mentioned in the denotation of the item. Hence, a composite 
feature structure of an item [A ^ a] can be seen as a substructure of a 
composite feature structure of a tree {A a), from which the features of 
internal nodes have been deleted. 

We write (/?([A a]) as a convenient notation for a composite feature 

structure ip{A,X \, . . . ,Xfc) where a = Xi, . . . ,Xfc. 

A similar interpretation will be given to various kinds of items that give 
various kinds of partial specifications of trees. As an example, consider the 
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item [S-^NP. VP, 0, 2], specifying the fact than an NP has been found by 
scanning the first two words (but we don’t care to remember what those 
words were). A feature structure ip{[S-^NP. VP, 0,2]) will be a composite 
feature structure (f{S, NP, VP) that denotes the appropriate substructure 
of ip{{S->{NP aia 2 ) VP)). 



8.5 Unification grammars 

With the lattice of (composite) feature structures, developed in in 8.1 and 8.3, 
we can now formally define a unification grammar as it has been informally 
presented in Chapter 7. 

The definition of unification grammars that we present here is not the 
most compact one that is possible. One could eliminate the context-free back- 
bone and let syntactic category be a feature as any other. If one abstracts 
from the syntactic category as a special feature, the definitions and notations 
become more terse, but somewhat more obscure. For the sake of clarity and 
compatibility with the other chapters, we will not do so. 

We take it for granted that syntactic category is such a fundamental 
notion that every feature structure for every constituent constraints at least 
a cat feature. Hence, in order to obtain a legible notation, we continue to 
call nodes in a tree by their syntactic category, like we did with context-free 
grammars. 

Definition 8.29. {unification grammar) 

A unification grammar is a structure 

g = {G,^,^^,W,Ccx). 

The different parts of this structure are defined as follows: 

• G = {N,E,P,S) is a context-free grammar. We write F for A/” U E] it is 
not required that iV fi = 0, a syntactic category is allowed to be both 
terminal and nonterminal. 

Furthermore, P is a multiset of productions, i.e., it is allowed that a single 
context-free production occurs more than one time. 

• ^ = ^{Pea, Const) is the lattice of feature structures based on a set of 
features Pea and a set of constants Const It is assumed (but not necessary) 
that Pea fl Const = 0. We assume cat G Pea and V C Const, allowing for 
syntactic categories to be represented in a feature structure. 

t (po : P->^ is a function that a assigns a coniposite feature structure to each 
production in the context-free grammar. Let p = A-^Xi . . .Xk for some 
p e P and (foip) be a feature structure (p{Yo,Yi, ...,Yk). Then, obviously, 
it is required that ^{Yo).cat = A and, for 1 < f < fc, ip{Yi).cat = X,. 
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Different feature structures can be attributed to a single context-free pro- 
duction by including the production more than once in 

• W is a, set of lexicon entries, i.e., “real” word forms, as opposed to lexical 
categories in U. It is assumed (but not necessary) that V C\ W = 0. We 
write a, . . . for words in W. 

• Cex is a function that assigns a set of feature structures to each word in W 
(a word may have different readings). Each (/?(a) E £ea;(a) for each aeW 
must have a feature cat. Moreover, it is required that (p{a).cat G Z*. 

We write UQ for the class of unification grammars Q that satisfy the above 
properties. □ 

One could argue whether the lexicon is part of the grammar or a separate 
structure. The size of the grammar is reduced tremendously when the lexicon 
is not contained in the grammar. It is somewhat artificial, however, to assume 
a grammar with production features ipo existing independently of a lexicon 
(W, Cex). The trend in unification grammars is that more and more informa- 
tion is stored in the lexicon, and the productions merely serve to prescribe 
concatenation and feature unification. 

The reason for introducing an alphabet W, consisting of words with lex- 
icon entries, is the following. In context-free parsing of natural languages it 
is standard use to consider the word categories., rather than the words from 
the lexicon, as terminal symbols. In Chapters 2 and 3 we have introduced the 
notational convention that leaves a, 6, . . . in a parse tree indicate a terminal 
symbol, while leaves a, 6, . . . indicate that these leaves correspond to words 
from the actual sentence that has to be parsed. In Chapter 2 the underlined 
terminal symbols were added to the grammar in the following way: 

• for the i-th word of the sentence, extra productions a^a- are added for 
each possible lexical category of that word. 

Verification that a word occurs in the sentence, therefore, could be expressed 
in terms of tree operations. For each auxiliary production we can supply a 
feature structure structure (in constraint set notation) 

(po{a-^g,i) = {(a) = (aj}. 

These auxiliary productions are not part of the grammar, but an implemen- 
tation technique that is used to construct the parse of a given sentence. We 
will stick to this notation, for the moment, because it allows us to express 
the difference between terminals that have been matched with the sentence 
and those that haven’t been matched yet. 



“ Alternatively, one could have P as a proper set and attribute a set of composite 
feature structures to each production. There is no need to use multisets, then, 
but in the remainder of the chapter the expression ‘Vo(A->a)” has to be replaced 
by “some (p in y?o(A— fa)”. 
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When we abstract from trees to items, in Section 8.7, we will simply have 
initial items of the form [a, j — 1, j] with a feature structure ^{a) G Cex{aj). 
The careful distinction between matched leaves and non-matched leaves will 
no longer be relevant then. 

Grammars may include ^-productions. In Section 3.1 we defined trees in 
such a way that an e-production generates a leaf labelled e. Throughout the 
remainder of this chapter we will simply assume that such leaves labelled 
€ are not decorated with any features. With this restriction, an arbitrary 
production A->a in all the following definitions also applies to 

Definition 8.30. {decorated trees) 

Let r G Trees{G) be a tree (cf. Definition 3.10.(m)) and ^p{r) a composite 
feature structure for the nodes in r. In order to simplify notation we write 
^{X) for the feature structure of a node with label 

A decorated tree is a pair (r, (</?(r)) with r G Trees (G) satisfying the following 
conditions: 

(z) for each node labelled A with children labelled Xi,...,Xfc there is a 
production A->Xi . . . G P such that 

{ii) for each node labelled a with child labelled it holds that ip{a) = ^{o>i); 
{in) for each node labelled there is some ^'{a^) G Juex{ai) such that 

We write VTrees{Q) for the set of decorated trees for some unification gram- 
mar Q. □ 

In 8.6, like in Chapter 2, we will construct parse trees by means of com- 
position of smaller trees. Any tree can be composed from atomic trees. When 
a new tree is created that is a composition of two existing trees, its features 
will be merged. In this way, context-free parse trees can be obtained that 
are decorated with feature structures. We should make sure, however, that 
the feature structure of a parse tree contains only “adequate” features (in 
a sense to be made precise shortly) which are derived from the productions 
and lexicon. One can always extend the decoration of a tree by adding new 
features out of the blue. For a decorated parse tree, it should be required 
that no unnecessary features have sneaked in. The following definition rules 
out “over-decorated” trees. 

Definition 8.31. {adequately decorated trees) 

We define adequate decoration of trees by induction on the tree structure. 

Note that different nodes of a tree may carry the same label, so we only use this 
notation when there can be no confusion about which node with this label is 
meant. 

The reader might wonder why we do not give a direct definition of a minimally 
decorated tree. One could call (r, (p{r)) minimally decorated if there is no dec- 
oration (p'{r) ip{r) such that ip'{r) C ^(t). The problem is, however, that 
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Let G G UG be a unification grammar and (r, </?(r)) a decorated tree. The 
adequacy of the decoration ip{r) is defined as follows, depending on the form 
of 

9 r = (a— (i.e. r matches a terminal with a word in the sentence). 
Then the decoration is adequate if ^p{a) = € £ea;(oJ. 

• r — {A-^a) (i.e. r covers a single production). 

Then the decoration is adequate if (^(r) ipoiA-^a). 

• T — {A^ {a ^ 0)) (i.e., a production {A^a) constitutes the top of the 
tree). 

Let a = Xi . . . Xk, 0 = 0i • ■ • 0k, such that {Xi 0i) is a subtree of r for 
\<i<k. 

We distinguish between degenerate subtrees, having a single node Xi = 0i 
and no edges and nondegenerate subtrees having more than one node and 
at least one edge. The (only) adequate decoration for a degenerate subtree 
is the empty feature structure. 

Then ^p{r) is an adequate decoration if there are adequately decorated trees 
((Xi ^ /3 i),¥^'((Xi - A))), ((Xfc - 0k ), - 0k))) 

such that 

^{{A-^ {a ^ 0))) = ip'HA^a)) U ^'{{Xi^0,)) U... 

Uip'{{Xk^0,)). □ 



Definition 8.32. {parse tree) 

Let ^ be a unification grammar, Ui . . . a string in W* . A parse tree for 
. . . a„ is an adequately decorated tree of the form 

((5 --*21... a„), (f{{S -*2i...a„))) 

with (p{{S ^ Qli • - -ttn)) 7^ -L • n 



adequately decorated trees need not be minimal. 

As an example, consider a grammar with the following productions: 

(1) ^{B) = [/ : a], 

(2) A-^B, ^(B) = [g : 6], 

(3) B-^C, ^{B) = [g : 6]. 

A tree {A ^ C) composed from the elementary trees of productions (1) and (3) 
is decorated adequately, but not minimal. 

In a practical grammar, it is likely that every adequately decorated tree is also 
minimally decorated. One could rule out grammars that allow non-minimal ad- 
equate decoration by additional constraints on the features of the productions 
and lexicon. This is not very relevant for the current discussion, therefore we 
bypass the issue with a definition of adequacy that is based on what ought to be 
proper composition of decorated trees. 

See Definition 3.8 on page 42 for various forms of linear tree notation. 
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Definition 8.33. (result) 

Let ((5 . . . a^), (p((S ^ . . . a„))) be a parse for the sentence . . . a„. 

The feature structure 

= <^(( 5 fii ... a „))|5 

is called a result of the sentence. □ 

In context-free parsing, parse trees are delivered as results. For unification 
grammars, it is assumed that the feature structure of the sentence symbol 5 
contains all relevant information. The parse tree is not an interesting object 
as such, it serves only to compute (p{S). Hence we can rephrase the parsing 
problem as follows. 

The parsing problem^ given sentence . . .a^ G W* and a grammar 
is to find all results ^p(S). 

Unlike the context-free case, we can also define a reversed problem. 

The generation problem, given a grammar Q and a feature structure 
is to find a sentence . . .a„ € W* for which ip(S) is a result. 

In principle it should be possible to use a single unification grammar both for 
parsing and generation. If a grammar is to be used in both directions, it must 
be guaranteed that both the parsing algorithm and the generation algorithm 
halt. A unification grammar that is designed for use in a parser typically will 
not halt when used for generation. Reversible unification grammars, that can 
be used in either direction, are studied in by Appelt [1987], Shieber [1988], 
Shieber et al. [1990], Gerdemann [1991], van Noord [1993], and Strzalkowski 
[1994]. Minnen et al. [1995] take a single (HPSG) unification grammar and 
optimize it differently for parsing and generation. 



8.6 Composition of decorated trees 

In 8.5 we have defined what a valid parse tree is, but not yet how such a 
tree can be computed. We will now define an operator for tree composition. 
Using this operator, one can create ever larger and larger trees from the initial 
trees based on grammar productions and lexicon. Thus, in the framework of 
Chapter 2, we have a primordial soup populated with adequately decorated 
trees. 

The primordial soup is sound if all parse trees for the sentence that may 
appear are adequately decorated and complete if all adequately decorated 
parse trees can be constructed. 



Wedekind [1988] has given such a definition for the generation problem in Lexical- 
Functional Grammar. 
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We define a decorated tree composition operator <i and extend that to 
a nondeterministic operator by dropping the index i. For technical reasons, 
the context-free tree composition operator is defined slightly differently from 
the way it was done in Chapter 2. (The difference is merely notation al, the 
trees that can be composed are the same). 

Definition 8.34. {context-free tree composition) 

For a context-free grammar G and any z E N a partial function 

Oil Trees{G) x Trees{G) — > Trees{G) 

is defined as follows. Let r = (Xq ^ X\ . . . Xk) and cr = (Iq Yi . . . 1/) be 
context-free trees in Trees {G). Then 

r (Xo-Xi...X,_i(Xi-Yi...Y)X,+i...X,} ifx,;- Yo, 
r (7 = < 

[ undefined otherwise. 

In a more practical interpretation, we interpret <1^ as an operator to create 
new trees from existing trees, rather than as a function. We drop the index 
i and obtain a nondeterministic operator <]. □ 



Definition 8.35. {decorated tree composition) 

For a feature grammar Q and any z G N a partial function 

<li: VTrees{Q) x VTrees{Q) — > VTrees{Q) 



is defined as follows. Let (r, (/?(r)) and (cr, (p(cr)) be decorated trees with r = 
(Xo ^Xi...Xk) and a = {Yo^Yi . . . Y/). Then 



(r,(^(r)) <li {a,ip{o)) 



undefined if r (j is undefined 

or ip{r) Ux,=Yo V’(o-) = -L, 

(r <i cr, (p{r) Uxi=yo ^(^)) otherwise. 



As in Definition 8.34 we may drop the index z and interpret <] as a nonde- 
terministic operator. 

We write (r, (^(r)) <1 {a,(p{a)) = J_ if the composition is not defined for any 



□ 



The next lemma states that composition of adequately decorated trees 
yields an adequately decorated tree. This result will not come as a surprise. 
But to be formally correct it is necessary to state it as a separate result. 
Adequate decoration was defined inductively by expanding a production tree 
with adequately decorated trees. It follows easily (but not by definition) that 
arbitrary tree composition of adequately decorated trees yields an adequately 
decorated tree. 
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Lemma 8.36. 

Let (r, (/?(r)) E VTrees{Q) and (a, (/?(cr)) E VTrees{Q) be adequately deco- 
rated trees. If (r, (^(t)) <3 (cr, (/?(cr)) E VTrees{Q) then (r, (/?(r)) < (cr,(/?(cr)) is 
also adequately decorated. 

Proof: by induction on the size of (r, (p(r)) <1 {a,(p{a)). 

Let r = (A-> {a /?)), a == Xi . . .Xjfc, 0 = 0i ... 0k aiS in Definition 8.31. In 
the composed tree r < cr, some leaf in some 0i is unified with the root of a. 
Let -w 0^)) be the adequate decoration of {Xi ^ 0i) from which the 

adequacy of (p{r) is derived. Then, using the induction hypothesis, we find 
that 



{{Xi^0i)<a, ^'{{Xi^0i))U^{a)) 

= {{Xi^ 0i))) < (<7,<p(a)) 

is adequate. It is easily verified that (r, (/?(r)) <1 {a,ip{a)) is obtained by 
composition of ((A->a), (/:?'( (A— ^-a))) with {{Xi 0i)^(p'{{Xi ^ 0i))), 

^ A_i))), {{Xi - 0i) < a, - 0i)) U 

(^(ct)), {{Xi^i ^ ...,((X, - 0k). ^'{{Xk - 

^fc))), as in Definition 8.31. □ 

Theorem 8.37. {correctness of primordial soup for decorated trees) 

A decorated tree (r, (^(r)) with r — {S a^. . .a„) that is obtained by tree 
composition <] from decorated trees of the forms 

((A^a), (^o(A->’a))) and 

((a->aj,(/?(a->aj)) with (p{af) E Cex{af) and c/?(a) = ^{Oa) 

is adequate. Moreover, each adequately decorated parse can be constructed 
from such trees. 

Proof. The soundness (context-free parse trees are adequately decorated) is 
a direct consequence of Lemma 8.36. It is trivial to prove (with induction on 
the size of the tree) that all adequately decorated trees can be composed, 
hence completeness follows a fortiori. □ 



8.7 Parsing schemata for unification grammars 

In 8.5 we have introduced unification grammars and 8.6 we have proven that 
the Primordial Soup framework for decorated trees is sound and complete. 
Integrating all this into context-free parsing schemata is mainly a matter of 
notation. 

There is, however, a single important difference between parsing schemata 
for context-free grammars and unification grammars, with far-reaching con- 
sequences. In the context-free case any item needs to be recognized only once. 
When an already recognized item is recognized again, it should be ignored. 
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For unification grammars, in contrast, a single context-free item can be rec- 
ognized multiple times, each time with a different decoration. These are to be 
regarded as different objects. Hence we may face the situation that a parsing 
schema with only a finite set of valid context-free items may yield infinitely 
many decorations to these items. 

At this very abstract level we will not worry about infinitely many dec- 
orations for a single context-free item. There are various ways to construct 
parsing algorithms that recognize only a relevant finite subset of valid deco- 
rated items. This will be discussed at more length in Chapter 9. 

We will first formulate a parsing schema UG that formalized what we 
did in Section 7.2: Constituents are recognized purely bottom-up. This can 
be regarded as the canonical parsing schema for unification grammars. 

A domain of items can be defined by adding feature structures to the 
usual CYK items. We could write 

^UG = {[{X,(f{X)),iJ] I XeV /\0<i<j A (/p(X)t^_L} 

where (p{X) is obtained by restricting the composite feature structure of the 
tree {X ^ . . •%) to the features of the top node. Throughout the re- 

mainder of this chapter items are decorated with feature structures, therefore 
we do not need to mention (f{X) explicitly in the notation of an item. Hence 
we write [A, z, j] as usual, rather than [(A, (/:>(A)), z, j]. 

The hypotheses represent all feature structures offered by the lexicon for 
all words in the sentence: 

^ I (f{a) e £ex{aj)}. (8.1) 

Schema 8.38. (UG) 

It is obvious, however, that deduction steps for productions with larger right- 
hand sides can be added in similar fashion. 

For an arbitrary unification grammar Q G UG we define a parsing system 
Pc/G = {Tug, H, Dug) by 

^UG = {[A, z,j] |AGFA0<z<jA (p{X) ^ L}] 

D {[Ai , zq, Zi], . . . , [A^ , z/j_i , Zfc] F~ [A,zo,Zfc] 

I A->Ai ,..Xk e P A A:>1A 
(p{A) = {(fo{A-^Xi . . . Xk) U (p{Xi) U . . . U ^p{Xk))\j^], 

\ A-^seP ^ ^{A)=^o{A-^s)}, 

Dug = U D^\ 
and H as in (8.1). 
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Many unification grammars that have been written to cover (parts of) natural 
languages have only productions that are unary or binary branching. In that 
case, the definition of D can be simplified to: 

= {[X,i,j] h [A,i,j] 

I e F A ^{A) = (y>o(A^X)U^(X))U }, 

£»(2) = {[X,i,j],[Yj,k] \- [A,i,k] I A^XYePA 

<p(A) = (MA^XY) U ^(X) U <p(y))U>, 

Dug = 

Sets of deduction steps for other values of k can be added likewise. □ 

It is not necessarily the case that the parsing schema UG yields a finite 
set of decorated items for an arbitrary grammar and sentence; even worse, 
the parsing problem for an arbitrary unification grammar is undecidable. 
Several sufficient conditions that guarantee finiteness of the UG schema are 
known from the literature,^® but no general necessary and sufficient condition 
is known. Hence we simply assume that a grammar Q has been defined in 
such a way that the parsing schema UG will halt. For unification grammars 
designed for parsing natural languages this does not seem to be problem. 
The underlying idea is that the meaning of a sentence, that will be captured 
somewhere in the result, is derived compositionally from the meaning words, 
via intermediate constituents; there is little reason to write a grammar such 
that ever more meaning is added to the same constituent. 

In the sequel, we will assume that a unification grammar Q has the prop- 
erty that for any string only a finite number of valid decorated items exists. 
How the grammar writer guarantees that this is the case (for example by 
making sure that one of the sufficient conditions is kept) is of no concern to 
us here. When we discuss other parsing schemata, the finiteness issue will 
come up again. Adding other fancy kinds of deduction steps - notably top- 
down prediction of features - may jeopardize the finiteness. In such a case 
we will show for a newly defined schema P that if a parsing system UG(^) 
halts, then P(^) will also halt. In other words, the finiteness in bottom-up 
direction is the responsibility of the grammar writer, whereas the finiteness 
in top-down direction is the responsibility of the parser constructor. 

Earley-type parsers for unification grammars that incorporate top-down 
prediction are discussed, among others, by Shieber [1985a], Haas [1989], and 
Shieber [1992]. In Chapter 11 a head-driven parsing schema will be defined 
that starts parsing those words that can be expected to yield features that 
are most restrictive for top-down prediction. 



The off-line parsability constraint [Bresnan and Kaplan, 1982] and the stronger 
notion of depth-boundedness [Haas, 1989] guarantee a finiteness. 
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We will now look at an Earley parser, formalizing what has been infor- 
mally explained in Section 7.1. A domain of items for the Earley schema is 
properly described by 

^Earley{UG) ~ {[( (/?( A->a.^) ) , Z, j] | (8.2) 

A-^a0 € P A 0 <i <j A 
ipo{A-^a/3) C (f{A-^a*P) A 
ip{A-^a.0) }; 

In order to simplify the notation, we attach identifiers to items. When an 
item is subscripted with a symbol t/, C, . . ., this symbol can be used in the 
remainder of the expression to identify the item. Moreover, we write (p{^) 
for the feature structure ip{A^a./3) of an item z, j]^. 

Furthermore, as with the CYK items, we do not mention the feature structure 
explicitly in the item. Thus we simplify (8.2) to 

^Earley{UG) ~ OL. 0 , i , j]^ | (8.3) 

A-^a0 e P A 0<z<jA 

}; 

Another useful notational convention is the following. Rather than writing 
</?(Olx feature structure of X derived from some composite feature 

structure within an item we write cp{X^). 

Schema 8.39. (Earley(UG)) 

For an arbitrary unification grammar Q G UQ a parsing system f^Earieyi ug) = 
{'^Earley[UG)^H,DEarley{UG)) is defined by PEarley{UG) ^ (8.3); 



jjlnit 


= { h [5-4.7, 0,0]{ 1 I^(^) = </7 o(5-47)}, 


jjScan 


= {[A^a.a0,i,j]r,,[a,j,j + \]q h [A^aa.0,i,j + l]^ 

1 =¥’(»?) LI <pK)}> 


jjCompl 


= {[A^a.B0,i,j]^,[B-^-f,,j,k](^ [A-^aB.0,i,k]^ 

1 = V(V) LI 


jQPred 


= {[A-^a.B0,iJ]r, h [B^.jJJ]^ 

1 ¥^(0 = LI (poiB—^y)}, 


J^Earley{ UG) 


_ jQinit y jjScan y jjCompl y j^Pred. 



and P as in (8.1). □ 

A unification grammar Q for which UG(^) is finite, may cause an infinite 
number of top-down predictions. A simple way to solve this (and the standard 
way to parse a unification grammar with a conventional active chart parser) 
is to limit the top-down prediction to the context-free backbone and replace 
by 

pPred [A-^a.B0,iJ]rj | = (po{B-^j) }. 
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It is not difficult to show that the modified Earley schema yields only finitely 
many different decorated items if the UG schema is known to do so. In Chap- 
ter 9 we will investigate more sophisticated techniques to prevent infinitely 
many decorations for a single context-free item. 

We have given two examples of parsing schemata for unification gram- 
mars. It is clear that other context-free parsing schemata can be extended 
with feature structures in similar fashion. 



8.8 The example revisited 

We return to the example of Section 7.2 and show how the schema Earley- 
UG can be used to parse our example sentence. The lexicon and productions 
for the cat catches a mouse were shown in figures 7.1 and 7.2 on pages 133 
and 134. In a PATR-style grammar, the composite feature structures ipQ 
are typically denoted by a constraint set. Here we will represent all feature 
structures, single and composite, by AVMs. 

In an Earley item of the form z, j], we are interested only in 

the features of A and 0, Features of A will be used to transfer information 
upwards through a parse tree (when an item [A— >a/?., i. A:] is used at some 
later stage as the right operand of a predict step). Features of 0 that are 
known already are used as a filter to guarantee that 0 will be of ‘‘the right 
kind” in whatever sense imposed by those features. The features of a need 
not be remembered. Features of a that are of interest for the remainder of 
the parsing process will have been shared with A or 0, other features are 
irrelevant. Our purpose, here, is to construct a resulting feature for S, rather 
than a context-free parse. 

We start with an item [S-^*NP VP, 0, 0], supplied with the features from 
(fo{S~^NP VP). The decorated item is shown in Figure 8.4. 

No features are predicted for the subject (other than that its category should 
be NP). Hence, an item [NP-^m*det *n,0,0] is predicted that is decorated 
with ipo{NP-^*det *n). For the sake of brevity we skip the deduction steps 

[NP-^.*det *71,0,0], [*det, 0,1] h [NP-^*det.*n,0,l], 

[NP-^*det.*n, 0,1], [*n,l,2] h [NP-^*det *n*, 0,2]] 

the reader may verify that the decorated item [NP-^*det *n*,0,2] as dis- 
played in Figure 8.5 is obtained. A complete step combines the items of Fig- 
ures 8.4 and 8.5 into a decorated item [S-^NP»VP, 0,2] as shown in Figure 

8.6. The features of the NP have been included in the VP through corefer- 
encing. 

From Figure 8.6 we predict an item [FF->. *v NP, 2, 2], as shown in Figure 

8.7. The subject feature that is shared between VP and *v causes the subject 
information to be passed down to the verb. Consequently, a verb can be 
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[S-^.NP VP, 0,0] 



cat : S 
0 

head : 



NP 



VP 



0 . 


[ cat 
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" cat 
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head 




. subject 
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..0, 2] 


~ cat : 1 


VP 


head : 


agr : 




trans : 



Fig. 8.4. The initial item 



NP K- 

Fig. 8.5. A completed NP 

[S-^NP.VP,0, 2] 

cat : S 
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head: 

cat : VP 
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pred: cat 
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VP 



head 



subject : 



[] 
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VP 
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person : third 
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Fig. 8.6. Complete applied to Figures 8.4 and 8.5 
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[VP->.*v NP,2,2] 



VP 



cat : VP' 

m 



head 



0 



subject : 
cat : *v 

0 n 
head : [ J 



subject : 



0 



^ cat : 1 


w 




agr : 


head • 






trans : 


_ 


- 



0 



object : 

0 

NP I — >■ [cat: NP] 

Fig. 8.7. Predict applied to Figure 8.6 



number: singular 
person : third 



pred: cat 
det : -f- 



accepted only if it allows a subject in third person singular. This is indeed 
the case for the initial item [*u,2,3], decorated with the lexicon entry for 
catches on page 133. Hence we obtain the item [VP-¥*v>NP,2,S\ with a 
decoration as shown in Figure 8.8. The *v entry has been deleted, as its 
salient features are also contained in the VP feature structure. Note that 
{NP head trans) is now coreferenced with {VP head trans arg2), through 
the coreference in the (no longer visible) feature structure of the verb. 

We can continue to deduce decorated items in similar fashion. It is left to 
the reader to verify that application of the deduction steps 

[VP->*v.NP,2,S\ f- [NP-^.*det *n,S,3], 

[NP—^.*det *n,3, 3], [*det,3,4] h [NP-¥*det.*n,3,4:], 

[NP^*det.*n,3,4], [*«,4,5] h [NP^*det *n.,3,5], 

[VP-^*v.NP,2,3], [NP-^*det*n.,3,5], h [VP-^*v NP.,2,b], 

[S->NP.VP,0,2], [VP--t*vNP.,2,5] P [5-4 iVP VP., 0,5] 

results in a decorated final item as shown in Figure 8.9. 
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Fig. 8.8. Scan applied to Figure 8.7 and “catches” on page 133 
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Fig. 8.9. A final item 
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8.9 Other grammar formalisms 

We will briefly mention some different kinds of unification grammars and then 
discuss the related formalisms of attribute grammars and affix grammars. 

The earliest type of unification grammar is Definite Clause Grammar 
(DCG), defined by Pereira and Warren [1980]. DCG is based on terms rather 
than feature structures. It is inextricably linked with the programming lan- 
guage Prolog [Clocksin and Mellish, 1981]. DCG, basically, offers some addi- 
tional syntactic sugar for encoding grammars directly into Prolog. 

In the last decade, a variety of grammar formalisms based on feature struc- 
ture unification has emerged. The Computational Linguistics community has 
been enriched with Lexical-Functional Grammar (LFG) [Kaplan and Bres- 
nan, 1982], Functional Unification Grammar (FUG) [Kay, 1979, 1985], Gen- 
eralized Phrase Structure Grammar (GPSG) [Gazdar et al., 1985], PATR^^ 
[Shieber 1986], Categorial Unification grammar (CUG) [Uszkoreit 1986], Uni- 
fication Categorial grammar (UCG) [Zeevat et ah, 1987], Head-Driven Phrase 
Structure Grammar (HPSG) [Pollard and Sag, 1987, 1994], Unification-based 
Tree Adjoining Grammars (UTAG) [Vijaj-Shanker et al., 1991]. This list is 
not exhaustive. 

The word “grammar” that appears in all these formalisms, has subtly 
different meanings in different cases. On the one hand, one can see grammar 
as a formalism that has no meaning per se, but can be used to encode gram- 
mars for whatever purpose. Typical examples of this class are DCG, FUG 
and PATR. On the other hand, one can interpret grammar as a description 
of phenomena that occur in natural language. Such a grammar does not only 
offer a formalism but, more importantly, also a linguistic theory that is ex- 
pressed by means of that formalism. Typical examples of this class are LFG, 
GPSG and HPSG. 

The feature structure formalism that we have used here is taken from the 
1986 version of PATR (with exception of the extension to composite feature 
structures). It was designed by Shieber to be the most simple feature structure 
formalism, containing only the bare essentials. A lot of bells and whistles can 
be added, of course. The use of lists^ which is admittedly cumbersome in 
PATR notation, can be simplified by introducing a special list notation. A 
useful extension to increase the efficiency of unification grammar parsing 
is coverage of disjunctive feature structures. We will come back to this in 
Chapter 9. 

We have used untyped feature structures: any feature can have any value. 
In a typed feature structure formalism, the value of a feature is restricted to 



The formalism is called PATR-II, to be precise, and quite different from a first 
version of PATR that has fallen into oblivion (and hence the letters “PATR” in 
PATR-II no longer form an acronym). 
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particular types specifically defined for that feature [Emele and Zajac, 1990]. 
See [Carpenter, 1992] for a textbook on typed feature structures. 

We have stipulated - as in PATR - that feature graphs contain no cy- 
cles. The practical reason is that it simplifies the unification algorithms, and 
cyclic feature structures seem to have little linguistic relevance. In HPSG, 
the feature formalism does not explicitly ban cycles, but in the first version 
[Pollard and Sag, 1987] they simply did not occur in any of the types pre- 
scribed for HPSG grammars. The 1993 version of HPSG [Pollard and Sag, 
1994], however, has somewhat different types and found an application for 
cyclic structures. Some linguists argue that the head of a noun phrase is the 
determiner, rather than the noun (the so-called DP hypothesis). In the latest 
version of HPSG, this matter is solved by letting both the determiner and 
the noun regard themselves as head of the NP and each other as a subor- 
dinate constituent. Hence either constituent is subordinate to a subordinate 
structure of itself. 

Unification grammars are related to attribute grammars^ introduced by 
Knuth [1968, 1971], that have been used in compiler construction for 25 years. 
There are some basic differences between attribute grammars and unification 
grammars, but from a formal point of view there is little objection to call 
both constraint-based formalisms. The difference between both formalisms is 
to a large extent a difference in culture: attribute grammars are typically used 
by computer scientists to denote the semantics of programming languages, 
while unification grammars are typically used by computational linguists to 
capture syntactic and semantic properties of natural languages. 

Attribute grammars stem from the age that higher programming lan- 
guages all were imperative languages. The basic statement is the assignment: 
a value, obtained from evaluating an expression, is assigned to an identifier. 
Expressions can be functions (i.e. sub-programs computing a value) of arbi- 
trary sophistication. Within the imperative programming paradigm, there- 
fore, it is the most natural approach to define attributes of a constituent 
as functions of other attributes of other constituents. The constraints in an 
attribute grammar can be thought of assignments:^^ 

(attribute) := (expression) 

where (expression) is a function of attributes of other symbols in the same 
production. 

Unification grammars, in comparison draw heavily upon the declarative 
programming style as incorporated in Prolog. A Prolog clause f oo(X,Y) spec- 
ifies the relation between X and Y. If X is instantiated then f oo can be used 



One could use attribute grammars also within the functional programming 
paradigm. Lazy evaluation can be used to solve some dependency problems eas- 
ier and more elegantly than in the imperative paradigm, but the central notion 
of functional dependency remains. 
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to assign a value to a variable Y, and reversed, if Y is instantiated then a vari- 
able X can get a value by calling foo^^. Similarly, in unification grammars 
we specify (commutative) equations that have to be true. In which order the 
features have to be computed is irrelevant, it is not even possible to express 
such considerations within the formalism. 

Research on attribute grammars, therefore, tends to focus on other issues 
than research on unification grammars. A classical issue is that of noncircu- 
larity: if there is a circle of attributes in a parse tree that are all functionally 
dependent on each other, then it is impossible to compute a decoration for 
the tree. An often used sufficient (but not necessary) condition is that of 
L-attributedness. An attribute grammar is L-attributed, informally speaking, 
if all attributes can be computed in a single pass in a top-down left-to-right 
walk through a context-free parse tree. A subclass that is particularly use- 
ful in compiler construction is the class of LR-attributed grammars. These, 
roughly speaking, allow the attributed to be computed on the fly by an LR 
parser. The literature contains a host of different parsing algorithms for LR- 
attributed grammars. See, e.g., Jones and Madsen [1980], Pohlmann [1983], 
Nakata and Sassa [1986], Sassa et al. [1987], and Tarhio [1988]). Each one 
defines a particular class of grammars on which it is guaranteed to work cor- 
rectly. All these classes are subtly different, however, because they depend 
on the guts of the proposed algorithm. A taxonomy is presented by op den 
Akker, Melichar and Tarhio [1980]. A fundamental treatment of attribute 
evaluation during generalized LR parsing (cf. Chapter 12) is given by Oude 
Luttighuis and Sikkel [1992, 1993]. 

“There are no fundamental differences between affix grammars [. . .] and 
attribute grammars [. . .]”, Koster [1991a] remarks in an article on affix gram- 
mars for programming languages. “The two formalisms differ in origin and 
notation, but they are both formalizations of the same intuition: the extension 
of parsers with parameters” . 

Affix grammars are a particular kind of two-level or van Wijngaarden 
grammars [van Wijngaarden, 1965], and were formalized by Koster [1971]. 
One can see the context-free productions of an affix grammar as production 
schemata^ defining sets of productions for different combinations of affix val- 
ues that can be attributed to the symbols involved in the production. Hence, 
even though grammars written as an affix grammar can be automatically 
translated to attribute grammars, and reversed, the basic formalism of affix 
grammars is more general, because its lacks the predominant concern with 
functional dependency. 

In the actual practice of Prolog programming, however, few clauses do really 
allow this. There is a difference between specification and computation: it is 
very well possible that the Prolog gets stuck in an infinite loop of the “wrong” 
argument is uninstantiated. This is similar to the fact that a unification grammar 
designed for parsing typically can’t be used for generation, although the general 
formalism is bidirectional. 
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Unification grammars with a finite feature lattice can be formulated di- 
rectly as affix grammars (so-called Affix Grammars over a Finite Lattice 
(AGFL), see Koster [1991b] for a simple introduction). Typically linguistic 
phenomena that can be modelled with finite feature lattices, or a finite do- 
main of typed feature structures, are conjugation (i.e. the different forms of 
a verb) and declination (forms of nouns, adjectives, etc.,) 

The main difference between affix grammars and both attribute gram- 
mars and unification grammars is again a cultural one. The school of affix 
grammars has its own followers and its own formalism, but the work done 
in that area can be formulated in terms of attribute grammars or unification 
grammars as well. 



8.10 Related approaches 

Some explicit parsing algorithms for unification grammars have been given in 
the literature. Haas [1989] gives a GHR algorithm (i.e. Graham, Harrison, and 
Ruzzo’s optimization of Earley’s algorithm, cf. Example 6.18) for grammars 
based on terms. A parsing schema for ID/LP grammars has been presented 
by Morawietz [1995]. 

Shieber [1992] gives an Earley parser for a general class of unification 
grammars, rather than just the PATR-formalism. The notation of Shieber 
[1992] - as opposed to the PATR variant of [Shieber, 1986], on which our 
treatment of unification grammars is based - allows for explicit control of 
feature percolation within productions; a production A-^Xi ... is a struc- 
ture with features 0, . . . , that address the separate constituents. Our con- 
cept of multi-rooted feature structures for describing feature sharing between 
different objects is more general, because it can deal with arbitrary object 
structures. 

The subject discussed here has some clear links with Shieber [1992], but 
we have taken a rather different perspective. Whereas Shieber gives a most 
general account of unification grammars and discusses only a single parsing 
algorithm, we have used just a simple unification grammar but given a for- 
malism that allows to specify arbitrary parsing algorithms in a precise but 
conceptually clear manner. 



8.11 Conclusion 

The main contribution of this chapter is the combination of parsing schemata 
and unification grammars in a single framework. Using the proper notation, 
parsing schemata for unification grammars are a straightforward extension 
of context-free parsing schemata. The hardest task was in fact to come up 
with a proper notation. Both parsing algorithms and unification grammars 
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are complex problem domains on their own. In order to combine them into a 
single framework, a large conceptual machinery and a rich notation is needed. 
It is for good reason that most articles in the literature are specific in one 
domain, and informal in the other. 

Context-free parsing is a computational problem area. A parse tree can 
be defined as an object that satisfies certain properties, but the only way to 
find these properties for a given sentence is to actually construct the parse 
tree. From this point of view, attribute grammars are the more natural way 
to extend context-free parsing with constraints and semantic functions. Dec- 
orating a tree with attributes (whether simultaneously or in a second pass) 
is indeed application of functions. 

The literature on unification grammars, on the other hand, has a strong 
focus on the declarative character of such a grammar. One describes the 
constraints that are implied by the grammar, and the properties of individual 
words in the lexicon. The theory leans heavily on logic, hence the prime 
operational concern is that constraints can be expressed in a subset of first- 
order logic that allows automatic constraint resolution. This being proven, 
one can leave the act of satisfying the constraints to an appropriate machine. 
From this point of view it makes sense to concentrate on the static aspects 
of the grammar, rather than on the dynamic aspects of how to construct a 
parse. 

The dynamics of unification and resolution sec have been studied exten- 
sively in the literature. It constitutes an auxiliary domain that is used as a 
tool in the construction of parsers for unification grammars, often in the form 
of the Prolog programming language. We have added a simple formalism that 
allows explicit specification of the dynamics of feature structure propagation 
in parsing algorithms. 




9. Topics in 

unification grammar parsing 



Context-free parsing schemata can be translated straightforwardly into pars- 
ing algorithms. Such naive implementations might not be the most efficient 
parsers, and one can improve the efficiency a lot by adding various kinds of 
sophistication to the algorithm, but it is obvious how a first, simple imple- 
mentation can be derived from a parsing schema. For unification grammars, 
however, it is not self-evident how a parsing schema can be translated to even 
a prototype parsing algorithm. In this chapter we will discuss various issues 
that have to be addressed in order to obtain practical parsers for unification 
grammars. 

This chapter mostly surveys other research, rather than presenting our 
own, but, for the above reason, we felt it useful to include it in this book. 

An important issue that we have ignored so far is unification: how to 
compute the lub of two feature structures. We know that lubs exist, because 
of the lattice structure, so we can write them down in parsing schemata. But 
when parsing schemata are to be turned into parsing algorithms we must 
know how to unify. Section 9.1 gives an overview of feature structure unifica- 
tion and presents a simple unification algorithm in detail. More sophisticated 
versions are discussed in 9.2 and 9.3. 

Another issue that enhances the practical value of unification grammars 
is disjunction within feature structures. Theoretically, a disjunctive feature 
structure can be seen as a short notation for a set of non-disjunctive feature 
structures. From a practical point of view, however, it won’t do to have to 
rewrite everything into disjunctive normal form before feature structures can 
be unified. How to handle disjunction is discussed in 9.4. 

In Chapter 8 we have noted that a single context-free item may, in princi- 
ple, have an infinite number of different decorations. In Sections 9.5 and 9.6 
we discuss restrictors that discard irrelevant features from a feature structure. 
This solves the problem of potentially infinite chains of predictions. 

A more general - and more important - use of restrictors is discussed in 
9.7. There are, in principle, two fundamentally different ways to construct 
a parse for a sentence. In a one-pass parser, each item is attributed with 
features when it is recognized. An alternative strategy is employed by a two- 
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pass parser, which constructs a set^ of context-free parse trees first and adds 
suitable decorations in a second pass. Using restrictors, one can construct 
intermediate kinds of parsers, that take only some features into account in 
the first pass, while other features are added in a second pass. 



9.1 Feature graph unification 

In Chapter 8 we have dodged the issue of how to compute a lub ^pl{X)U^p 2 {X) 
of two arbitrary feature structures ^\{X) and The lattice structure 

guarantees its existence, and examples were simple enough to do unification 
“by hand”. 

There is a wealth of literature on the subject, one could even speak of 
unification theory as a field of its own. As this topic is of such central impor- 
tance to unification grammars, we make a digression from the main theme 
and discuss the algorithmic aspects of feature structure unification in some 
detail. 

A good introduction to unification theory is given by Siekmann [1989], 
a survey of algorithms and applications is provided by Knight [1989]. It is 
important to note, however, that unification theory is concerned with term 
unification, which is not exactly the same as feature structure unification. 
Feature structures can be seen as an extension of terms. The most salient dif- 
ference is that feature structures allow coreferencing of arbitrary substructures 
whereas terms only allow coreferencing of leaves'^. Hence it is not self-evident 
that a term unification algorithm can be extended to a feature structure uni- 
fication algorithm. In many cases, however, the extension to feature structure 
unification is straightforward. In the sequel we will give such an adaptation 
of the algorithm of Huet [1976] as an easy and efficient algorithm for feature 
structure unification. 

We give a formal definition of term graphs similar to Definition 8.3 for 
feature graphs. This is only meant to formally write down the difference 
between both concepts; we will make no further use of term graphs. 

Definition 9.1. {term graphs) 

We assume a domain of functions where each function has a fixed 

arity (i.e. number of arguments taken by the function). Functions with 0 

^ Or a shared forest, cf. Section 12.4. 

^ Some term unification algorithms make use of subgraph sharing for the sake of 
efficiency. Consider, for example, a term f{g{a,h{x)),h{g{a,h{x))),y) in which, 
using graph representation, the term g{a, h{x)) can be represented by a single 
subgraph. It should be stressed, though, that sharing of subgraphs in term graphs 
can always be done because it doesn’t change the interpretation of the term! 
Token identity (other than variables carrying the same names) is a concept that 
simply does not apply to terms. 
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arguments are also called constants^ denoted a, 5, Furthermore, we have 

a set of variables x,y, 

A term graph is a (finite) tree with the following properties 

(i) Every non-leaf vertex v is labelled with a function. Let n be the arity of 
the function, then there are n (ordered) outgoing edges from v. 

{ii) Every leaf is labelled with a constant or a variable. 

The edges are not labelled. □ 

A term can be extended by instantiating a variable with another term. But 
it is essential that the same variable (if it occurs more than once in the term) 

is instantiated to the same term. Hence we can see a term tree as a directed 

acyclic graph (dag) that allows subgraph sharing only for leaves labelled 
with variables, not for other kinds of substructures. 

We will not be concerned with terms and discuss how feature structures 
can be unified. This is easiest to carry out in graph representation. We will 
present a feature graph unification algorithm that is a straightforward adap- 
tation of the algorithm of Huet [1976] for term unification. The task is to 
create a new feature graph which is the lub of two given feature graphs. 
We call the new graph the unifact and the given graphs the operands^. For 
the sake of clarity we assume that the operands are single feature graphs. 
Extension to composite feature graphs is trivial. 

The general principle of the algorithm is quite simple. Input are two fea- 
ture graphs as operands (represented by their root vertex). The algorithm 
computes an equivalence relation on the vertices of both operands, such that 
each equivalence class corresponds to a single vertex in the unifact. Initially, 
all equivalence classes are singletons, except the roots of the two operands, 
which form a single class. When two equivalent vertices have a feature in com- 
mon, then the children corresponding to these features must be equivalent as 
well. That means, their equivalence classes have to be merged. In this way a 
“transitive closure” can be computed, either recursively or by keeping a list 
of pairs of vertices that still have to be dealt with. Unification fails (and _L 
is delivered as unifact) if an equivalence class contains a pair of incompatible 
vertices. Two vertices are incompatible if 

• one is a leaf labelled with a constant and the other is a non-leaf vertex, or 

• both are leaves but labelled with different constants. 

When no more equivalence classes need to be merged, and no incompatibility 
has appeared, the unifact can be computed by contracting the classes to single 

^ It is tempting to call a graph that is to be unified a “unz/icand” , by analogy to 
“operand”. The proper form, however, following the Latin etymology, should be 
the gerundive “nni/acenrf”. This does not have an equally persuasive connotation 
for the mathematical reader, hence we stick to “operand” . 
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vertices. This has the consequence, however, that the operands are destroyed. 
Therefore, this method is called destructive unification. In 9.2 we will discuss 
a nondestructive unification algorithm. 

Manipulation of the equivalence classes is done by the union and find 
operations as given by Aho, Hopcroft and Ullman [1974]. Vertices have an 
additional class pointer that is used for maintaining the classes. The vertices 
that comprise a class are linked in a tree structure (not to be confused with 
the DAG structure of the operands!). Each class has a unique representative: 
the root of its class tree. 

The UNION operation merges two classes, simply by making the repre- 
sentative of one class a child of the representative of the other class. The 
latter vertex henceforth represents the merged class. As a general policy, the 
representative of the larger class becomes the joint class representative. 

The class representative of any vertex can be found by traversing a path 
along the class pointers. The find operation searches for the root of a class 
tree in a slightly more subtle way: whenever a path to the root is accessed, 
all vertices on that path are made direct descendants of the root. Thus a 
deep class tree is flattened by access. This makes the complexity of the find 
operator (almost) independent of the size of a class. 

This general scheme for merging equivalence classes is called the union- 
FIND algorithm. The complexity of a sequence of n union and FIND operations 
on a graph of arbitrary size is almost linear: 0(na(n)), with a a very slowly 
increasing function, a is the inverse of a function F, characterized by F(l) = 
1, F{n) = Hence we find a(2^®) = 4, a(2®^^^®) = 5. When the 

UNI ON- FIND algorithm is used for feature graph unification in the context 
of natural language parsing, it is pretty hard to come up with a realistic 
example where a class comprises as much as half a dozen vertices. Hence the 
non-linear factor in the complexity of the algorithm is purely theoretical and 
has no practical relevance at all. 

In order to write down the algorithm in a more tangible form, we assume 
that vertices in a feature graph carry the following attributes: 

• features: a list of pairs (/,p) with / is a feature and p a pointer to another 
vertex. We assume the set of possible features to be ordered, hence the list 
of pairs can be ordered on features. 

• kind: indicates the kind of vertex, i.e., constant, variable, or complex. 

• label: denotes the label of a vertex (only applicable to leaves), i.e., a con- 
stant. 

• class: pointer to a vertex in the same equivalence class. If u. class ~ u then 
u is the representative of the class. 

There are three kinds of vertices: complex vertices have a non-empty list of 
features and no label; constant vertices are labelled with a constant but have 
no features; variable vertices have neither features nor label. 
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function compute^equivalence-classes{fgl ^ fg2: vertex): boolean; 

begin 

pairs.to-unify := {{fgl , fgS)}; 
while pairs. to -unify is not empty 
do take some pair (x, y) from pairs -to -unify; 
u := FIND(x); V := FIND(y); 

V 

then if compatible{u^ v) 
then merge(u,v) 
else return (/a/5e) 

od; 

return(^rue) 

end; 

procedure merge{u^v: vertex); 

(precondition: v axe class representatives) 

begin 

X := UNlON(ifc, u); (* i.e.: either x = u or x = v *) 
if X = u then y := v else y := u fi; 
if x.kind = variable and y.kind ^ variable 
then x.kind := y.kind; x. label := y. label fi; 
for each feature-pointer-pair (/,p) G y. features 
do if there is some (/, q) G x.features 
then add (p, q) to pair s-to -unify 
else add (/,p) to x.features 
fi 
od 

end; 



Fig. 9.1. Computation of equivalence classes 



For the proper functioning of the algorithm it is essential that the rep- 
resentative of an equivalence class has the characteristic properties (i.e. kind 
and either label or features) of the entire class. Hence proper care has to 
be taken when two classes are merged. One of both representatives will be- 
come the representative of the merged class, and has to take over the relevant 
properties of the other representative, if not already present. 

A straightforward algorithm for the computation of the equivalence classes 
is given in Figure 9.1. If the algorithm is run on composite feature structures, 
then pairs-to-unify should be initialized with all pairs of roots that have to 
be unified. 

As a simple example, consider the feature graphs in Figure 9.2 as 
operands. Initially, pairs.to-unify — {(1,6)}. A call to UNION (1, 6) yields 1 
as representative of the combined class. (To be deterministic, we assume that 
the representative of the first argument is chosen if both classes are equally 
large). Merging the feature lists 

l.features = [(/, 2), (p, 4)] and 6. /ea^wres = [(/, 7), (p, 7), (/i, 9)] 
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we get 

1. features := [{f,2),{g,4),{h,9)] 

with pairs -to -unify = {(2, 7), (4, 7)}. We continue taking the union of {2} 
and {7}, yielding an equivalence class {2,7} represented by 2. Taking over 
the feature k from 7, we get 

2. features := [(j, 3), (fc, 8)]. 

A call union(4,7) merges the classes {2, 7} and {4}, choosing 2 as their joint 
representative. Merging the features of 4 into those of 2 yields a last pair 
to be unified: (5,8). When this is done, we have reduced 10 vertices to 6 
equivalence classes 

{1,6}, {2,4,7}, {3}, {5,8}, {9}, {10} 

as shown in Figure 9.3. Vertices within one class are linked with — ==, the 
representative is indicated by a double circle. The actual tree structure of the 
equivalence class is irrelevant. 




Fig. 9.3. The equivalence classes 



As a final step, we have to contract the classes to single vertices. To that 
end, pointers to a non-representative vertex must be changed to pointers to 
their representatives. I.e., if (/,p) is a feature and FiND(p) p then it has 
to be replaced by (/,FlND(p)). In our example, the features (^,4) becomes 
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{g,2) in l.features and (k,8) is changed to (A:, 5) in 2.features. The non- 
representative vertices are deleted and every class is a singleton again. The 
final situation is shown in Figure 9.4. 




A point that should be noted is that we do not allow cycles in feature 
graphs. It is conceivable, however, that non-cyclic operands unify to a cyclic 
(and hence inconsistent) graph. Hence the resulting graph has to be checked 
for cycles before it is delivered as a unifact. In Section 9.2 we will discuss in 
more detail how redirecting of pointers and checking for cycles can be done 
in a single sweep through the graph. 

The complexity of our version of Huet’s algorithm for feature graph unifi- 
cation can be computed as follows. Let k be the maximum number of features 
(i.e., the maximum outdegree) of a given vertex, and n the number of vertices 
in feature graph. Then the algorithm has complexity 0{kna{kn)). This can 
be seen as follows. 

Pairs of vertices taken from pair s^to -unify come in two categories: the 
pair can either be already equivalent or not yet equivalent. Every pair gen- 
erates two calls to FIND. Only not-yet-equivalent pairs generate a call to 
merge, which calls union and merges lists of up to k feature-pointer-pairs. 
Furthermore, up to k new pairs of vertices can be added to pairs-to -unify . 
The number of not-yet-equivalent pairs is limited to n (after which all ver- 
tices are equivalent), hence the total number of vertices, counting duplicates, 
that can be added to pair s-to -unify is kn. The 0{kn) already equivalent pairs 
generate 0{kn) union/find calls, the 0{n) not yet equivalent pairs generate 
0{n) union/find calls and 0{kn) other work; two lists of feature-pointer- 
pairs can be merged in 0(k) steps when sorted on feature. Thus computing 
the equivalence classes takes 0{kna{kn)) steps. 

Subsequently, pointers to non-representative vertices have to be replaced 
by pointers to their representatives. This takes 0{kn) steps. Absence of cy- 
cles can be detected in 0(kn) steps using a depth-first search. In summary, 
0{kna{kn)) steps suffice. 

Practically speaking, the factor 0{a(kn)) is constant and we obtain a 
complexity of 0{kn). Moreover, for any particular unification grammar, the 
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number of features emerging from a particular vertex will be bound by a 
constant number fc, in which case the complexity is reduced to 0(n). Thus 
the algorithm is linear for all practical purposes. 

A much cited unification algorithm for feature structures is the congruence 
closure algorithm of Nelson and Oppen [1977, 1980]. A more general version 
is given by Gallier [1986]. The congruence closure algorithm is also based 
on the UNION-FIND algorithm of Aho, Hopcroft, and Ullman [1974] and can 
be regarded as a generalization of Huet’s algorithm. It computes equivalence 
classes of a set of vertices of a graph consisting of an arbitrary number of 
components, starting from an arbitrary initial partition into classes. Nelson 
and Oppen give a worst-case complexity of with m the number of 

edges in the graph. An implementation with a theoretically lower complexity 
bound O(mlog^m) is given by Downey, Sethi, and Tarjan [1980], but it ap- 
pears not to be faster in practice [Nelson and Oppen, 1980]. When restricted 
to c.q. reformulated specifically for feature graph unification, the congruence 
closure algorithm is very similar to the extension of Huet’s algorithm dis- 
cussed above. A recent survey of union-find and related algorithms is given 
by Galil and Italiano [1991]. 

Different unification algorithms with the same complexity as Huet’s have 
been given by Baxter [1973] for term unification and Ait-Kaci [1984, 1986] 
for feature structures. Truly linear term unification algorithms also exist, but 
the improvement is only theoretically relevant. Linear algorithms are given by 
Paterson and Wegman [1987], de Champeaux [1986] and Martelli and Monta- 
nari [1977, 1982]. A quadratic (0(n^)) implementation of the (originally ex- 
ponential) algorithm of Robinson [1965] is given by Corbin and Bidoit [1983]. 
They claim their algorithm to be simpler than the algorithm of Martelli and 
Montanari, and faster in practical applications. 



9.2 Nondestructive graph unification 

The graph unification algorithm presented above destroys the operands in 
the process of constructing a unifact. As operands typically must be used 
more than once, each operand has to be copied before unification takes place. 
Moreover, if the unification fails, the copies are wasted entirely. It turns out 
that copying accounts for more than half the time spent by a parser using a 
destructive unification algorithm [Karttunen and Kay, 1985], [Godden, 1990]. 
It is not too difficult, however, to change the unification algorithm in such a 
way that unification is nondestructive, i.e., the operands are not affected by 
the computation of the unifact. Rather than a final situation as displayed in 
Figure 9.4, we would like to obtain a final situation as shown in Figure 9.5. 
To that end, we make the following changes to the algorithm: 
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• each equivalence class is represented by a new vertex, rather than a vertex 
from one of the operands. 

• when the unifact has been constructed, the class pointers of the operands 
are reset. 

An algorithm in this vein was first presented by Wroblewksi [1987]. When 
two singleton classes are merged, a third vertex is created as their joint repre- 
sentative. Only if two non-singleton classes are merged, a spurious vertex has 
been made, apparently, because one of both new vertices suffices to represent 
the merged class. Subgraphs that occur in only one of the operands have to 
be copied for the unifact. 

Wroblewski’s algorithm has some practical problem when to decide that 
a subgraph needs to be copied, which causes the algorithm to make double 
copies in some weird cases. See [Wroblewski, 1987] for details. For resetting 
the class pointers, Wroblewski suggests a simple implementation trick. Each 
class pointer is annotated with a generation number. Any pointer with an 
obsolete generation number should be regarded as a self-pointer (i.e. points 
to the vertex it originates from). Thus, after the unifact has been completed, 
incrementing the global generation counter suffices to reset all pointers in one 
stroke. 






Fig. 9.5. An example of nondestructive unification 



The algorithm of 9.1 can be adapted with only a few changes. In the 
nondestructive algorithm a vertex has the following attributes: 

• features, kind, label, class: as in 9.1. 

• status: takes values old, new, and intermediate. 

All vertices of the operands are old, newly created vertices are new. The 
intermediate state is a technical aid for the construction of the unifact from 
the final set of equivalence classes. 

We add a function n-union for nondestructive union. It creates a new 
vertex when the representatives of both classes are old. When classes repre- 
sented by a new and an old vertex have to be merged, we can simply take the 
existing new vertex as a representative of the merged class. This is supported 
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function n-union{u^v: vertex) : vertex; 

(precondition: n, v are class representatives) 

begin 

if (u. status = new or v. status = new) 
then w := union(u, u) 
else create a new vertex w; 

u. class := w; v. class := w; w. class := 
w.kind := variable; w.label := none; 
w.features := nil; w. status := new 



fi; 

return(ty) 



end; 



w; 



Fig. 9.6. Nondestructive union 



by the UNION implementation in [Aho et al., 1974], which takes the root of the 
larger class tree as the root of the merged class trees. The function n-union 
is defined in Figure 9.6. 

Figure 9.7 shows how the equivalence classes can be computed nondestruc- 
tively. It is guaranteed that the operands are not changed by the unfication 
algorithm, as no attribute of a vertex of an operand ever gets changed (with 
the exception of the class pointer). 

The complete unification algorithm is sketched in Figure 9.8. Retrieving 
the unifact from the final partition into equivalence classes is somewhat dif- 
ferent from the destructive case. In a single walk through the new graph, the 
applicable feature pointers are redirected, the new vertices are converted to 
old ones and the graph is checked for cycles. 

As feature graphs are acyclic by definition, the unification should fail after 
all if a cycle is detected. Cycle detection can be trivially incorporated in the 
walk through the new graph. While going down, the status of new vertices is 
changed into intermediate; while going up, the status of vertices is changed 
into new. Clearly, the graph contains a cycle iff at some stage a new vertex 
is found with an intermediate daughter. 

The class pointers can be reset later by walking through the operands. 
A more efficient implementation, as suggested above, is to keep a global 
generation counter; all class pointers can be invalidated by increasing the 
generation counter. 

We will run through the example again, taking the graphs in Figure 9.2 
as operands. 

Computing the equivalence classes proceeds as follows. Initially there is 
only one pair to unify: (1,6), the pair of roots. Hence the equivalence classes 
{1} and {6} are merged into {1, 6, 11} with the new vertex 11 representing the 
class. The features of 11 are computed by merging features — [(/, 2), (^, 4)] 
with features = [(/, 7), (^, 7), (ft, 9)], yielding 

n. features = [(/,2), (/i,9)] 
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function compute-equivalence-classes{fgl , fg2: vertex): boolean; 
begin pairs -to. unify := {{fgl , fg2)}; 

while pairs -to -unify is not empty 
do take some pair (x,?/) from pairs -to -unify; 
u := find(x); v := FlND(y); 

\f V then 

if compatible{u^v) 
then merge{u,v) 
else ret urn (/a/5 e) 
fi fi 

od; 

return(^rue) 

end; 

procedure merge(u^v); 
begin x := n-union{u^v); 
for y := u,v 
do y ^ X then 

if x.kind = variable and yMnd variable 
then x.kind := y.kind; x.label := y. label fi; 
for each feature-pointer-pair (/,p) G y. features 
do if there is some (/, q) G x. features 
then add (p, q) to pairs -to -unify 
else add (/,p) to x. features; 
if FiND(p).5/a/u5 = old 
then copysubgraph{FiND{p)) 

fi fi 

od fi od 

end; 

procedure copy subgraph (x); 

(precondition: x. class = x, x. status = old) 
begin create a new vertex y; x. class := y; y. class := y; 

y.kind := x.kind; y. label := x.label; y. status := new; 

y. features := copy-list {x. features); 

for each pair (/, g) G y. features 

do if q. status = old then copysubgraph{q) fi od 

end; 



Fig. 9.7. Nondestructive computation of equivalence classes 



With (2,7) and (4,7) as new pairs to be merged and the subgraph rooted by 
9 to be copied. Copysubgraph{9) creates a new vertex 12 as a representative 
of the equivalence class {9, 12}. As 12.features := [(/, 10)], a new copy 13 of 
vertex 10 is created, also labelled with the constant c. 

Next, we merge 2 and 7 into {2,7,14}, with features j and k of vertex 
14 pointing to 3 and 8, respectively. Using copy subgraph, these vertices are 
extended to equivalence classes {3, 15} and {8, 16}. One pair is left to unify: 
(4,7). Hence equivalence classes {4} and {2,7,14} are merged into {2, 4, 7, 14}. 
Following the definition of merge in Figure 9.7, we have to merge the features 
of 4 and 7 into the features of 14. Both 4 and 7 have only feature k which 
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function unify{u^v: vertex): vertex 

begin 

if compute.equivalence-classes{u, v) 
then w := FIND(u); 

if not wind.up{w) then w := 1. fi 
else w := 1. 

fi; 

reset the class pointers; 
return (it;) 

end; 

function wind-up{v: vertex): boolean] 

(redirects feature pointers as appropriate; 

makes new vertices old; checks for cycles) 

begin 

if v.kind = intermediate then return (/a/se) fi; 

if v.kind = new 

then v.kind := intermediate] 

for each pair (f^w) 6 v. features 
do y := FlND(it;); 

if 2/^ w then replace (/, w) by (/, y) fi; 
if not wind-up{y) then return(/a/5e) fi; 
od; 

v.kind := old] 

fi; 

Tet\ivn{true) 

end; 



Fig. 9.8. The unification algorithm 



is already present in the feature list of vertex 14 (pointing to 8). Hence we 
add (5,8) and (8,8) to the pairs to unify. As 8 and 8 are member of the same 
class, no work needs to be done^. Unifying 5 and 8 means merging {5} and 
{8, 16} into the equivalence class {5,8, 16}. 

The list of pairs is empty now. The situation is sketched in figure 9.9. 
Equivalent vertices are linked by the representative is indicated with 

a double circle. 

From the graph in figure 9.9 we can construct the unifact straightfor- 
wardly. The features of 11, [(/, 2), (^,4), (/i,9)], are replaced by 

[(/,find(2)),(5,find(4)),(/i,find(9))] = [(/, 14), (9, 14), (/i, 12)]. 

Similarly, to lA. features the list 

[(j,FIND(3)),(fc,FIND(5))] = [(j,15),(fc,16)] 

is assigned, and so on. Thus we construct the final graph, which was displayed 
in figure 9.5 on page 181. 

One could also add a check in merge so as to prevent equivalent pairs to be put 
on the list of pairs to be unified. 
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Fig. 9.9. The equivalence classes in nondestructive unification 



The complexity of the nondestructive algorithm, like the destructive al- 
gorithm, is theoretically 0{kna{kn)), with k the maximum outdegree of a 
vertex, and practically 0{kn). If k is considered constant (as it will be for 
any particular grammar) the algorithm is linear in the size of the operands. 



9.3 Further improvements 

In unification grammar applications, the nondestructive algorithm is more 
efficient than the destructive algorithm, because the operands need not be 
copied before unification. The algorithm presented in 9.2 is by no means 
optimal, however. The number of vertices to be copied can be further reduced 
by subgraph sharing. If a feature exists in only one of the operands, it is 
usually not necessary to copy the entire subgraph pointed to by that feature. 
The unifact could share a subgraph with one of its operands. A unification 
algorithm that exploits subgraph sharing could create a unifact as shown in 
figure 9.10. In our example, only 3 new vertices need be created, rather than 
6 as in figure 9.5. 




Fig. 9.10. Subgraph 
sharing 
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A unification algorithm that exploits subgraph-sharing is rather more 
involved; it must keep track of the conditions under which subgraph sharing 
is safe. Subgraph sharing and coreferencing can interfere with each other, 
leading to incorrect results. A more detailed treatment is given by Kogure 
[1990], who describes a nondestructive unification algorithm with subgraph 
sharing. This algorithm uses a form of lazy copying. Subgraphs are shared 
between the unifact and an operand as long there is no evidence that making 
a copy is necessary. When it is detected that a descendant of a shared vertex 
will be affected by unfication at some later moment, the shared subgraph 
needs to be copied after all. 

Kogure extends his “lazy incremental copy graph unification algorithm” 
with a strategy that first unifies those features that are most likely to cause 
failure. Such a strategy could be added to Huet-type algorithms as well, as 
no order is prescribed in which pairs are to be taken from the list of pairs to 
be unified. 

Karttunen and Kay [1985] use a destructive unification algorithm in com- 
bination with lazy copying: subgraphs are shared until one of the shared 
copies is updated. Furthermore, feature graphs are represented in [Karttunen 
and Kay, 1985] by means of binary trees; a parent-child relation (i.e., an edge 
of the feature graph) is represented by a search path in the binary tree. The 
method is not worked out in great detail in the cited article. 

Pereira [1985] does not copy feature graphs, but keeps updates to a feature 
graph separately. The original feature graph is not changed, additions are kept 
in a separate structure. Thus the cost of making copies is traded against the 
cost of applying the update. This technique dates back to the theorem prover 
of Boyer and Moore [1972]. 

Karttunen’s “reversible unification algorithm” [Karttunen, 1986] is in fact 
also a nondestructive algorithm. Only temporary changes are made to the 
operands. If the unification succeeds, a separate unifact is constructed. 

Tomabechi [1991] merges Karttunen’s approach with the nondestructive 
algorithm Wroblewski. He claims his algorithm to be twice as fast as Wrob- 
lewski’s. Like in [Karttunen, 1986], not a single new vertex is created until 
the unification is known to be successful. Prom Wroblewski [1987] he takes 
the technique to undo all temporary changes to the operands in one stroke 
by using a global generation counter. 

Emele [1991] in a very readable paper comes up with an algorithm that 
merges the approaches of Pereira [1985] and Wroblewski [1987] in an elegant 
fashion. Vertices carry generation numbers. In addition, each feature graph is 
associated with some specific generation. When a vertex is changed in a later 
generation, a forwarding pointer to a new vertex is made. Thus a vertex has 
a history over time, represented by a chain of vertices with non- decreasing 
generation numbers. When a feature graph of a particular generation has 
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to be retrieved, each vertex in this graph is found by following the path of 
forwarding pointers up to the last vertex that has a generation number not 
exceeding the generation asked for. In Emele’s algorithm the unifact is in fact 
the next generation of one of its operands. Prom a single root, the unifact 
can be retrieved using a higher generation number, while the operand can be 
retrieved using a lower generation number. 

A disadvantage of Emele’s approach is that the paths of forwarding point- 
ers cannot be shortened. Hence the complexity of searching a graph (and, 
consequently, the complexity of unification) is dependent on the length of its 
history as well as its size. This makes the theoretical complexity essentially 
non-linear. It seems likely, however, that Emele’s algorithm might be superior 
in practice. 

Finally, a somewhat different approach is taken by Godden [1990] who 
introduces “lazy unification”, i.e., unification (rather than copying) of sub- 
structures is delayed. This is in principle an interesting idea, but it needs 
substantial additional overhead. While obtaining a speedup of 50 % com- 
pared to naive, destructive unification, his algorithm is substantially slower 
than the ones from Tomabechi and Emele. 

It has been remarked by several of the authors cited above that it depends 
on the particular application which approach to reduce copying will perform 
best. 



9.4 Disjunctive feature structures 



By far the most interesting extension to the unification grammar formalism is 
the use of disjunctive feature structures. For a verb form “catch”, for example, 
we would like to write 



(catch head agr) = [ number : pluraH\ V 



number: singular 
person : 1st V 2nd 



One could also add negation, and simply write down that the agreement of 
“catch” is not third person singular. 

It is always possible to avoid disjunction within feature structures by 
rewriting them into disjunctive normal form. For the verb form “catch” we 
would then obtain three lexicon entries with agreement features plural^ first 
person singular and second person singular, respectively^. But for the sake 
of efficiency it is not desirable to use disjunctive normal form. 



^ The lexicon may also contain other entries for “catch” as, e.g., a verb in infinitival 
form. But that entry does not specify any agreement. 
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In order to obtain a graph representation for disjunctive feature struc- 
tures, we can modify the graph representation of standard feature structures 
as follows. Every vertex is split into two vertices: a ‘Top half” called a fea- 
ture vertex and a “bottom half” called a value vertex. All incoming edges go 
to the feature vertex; all outgoing edges start from the value vertex. In the 
standard case, without disjunction, every feature has exactly one vjilue, i.e., 
every feature vertex has a single outgoing edge to its corresponding value 
vertex. 

In a disjunctive feature graph it is possible that a feature vertex is linked 
to different value vertices. If a feature vertex is linked to no value vertex, this 
represents an inconsistency. A disjunctive feature graph is shown in figure 
9.11. Feature vertices are represented by A, value vertices by V- figure 
9.11(a) the bipartite graph is shown. In a rather more practical notation, 
as shown in figure 9.11(b), feature vertices that have exactly one value are 
combined with their value vertices. 



A 

I 

V 

agr I 



A 




sg l5^ 2nd pi 

Fig. 9.11. A disjunctive feature graph 



agr 



A 




The same information can be represented by different feature graphs. It 
is always possible to push the disjunction upwards to the top level. In that 
way we only have to deal with standard feature graphs, but the number of 
different alternatives may grow rather large. For the simple example in figure 
9.11, two alternatives are shown in figure 9.12. In figure 9.12(b) we have 
moved all disjunctions to the root and we have obtained a disjunction over 
three nondisjunctive feature structures. 

A graph representation for disjunctive feature structures is formally de- 
fined as follows. 
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Definition 9 . 2 . {disjunctive feature graphs) 

A bipartite directed graph G = (Vi, V2? ^i->2^2->-i) has two sets of vertices 
V\^V2. Edges in Ei_>2 go from a vertex in V\ to a vertex in F2, edges in E2->i 
go from a vertex in V2 to a vertex in Vi . There are no edges connecting any 
pair of vertices within V\ or V2. 

The class of disjunctive feature graphs, VTQ, is the class of finite, rooted, 
bipartite DAGs {Vf,Vv,Ef-^y,Ey^f) with the following properties: 

(z) the root is an element of V/, all leaves are in V^; 

(a) every edge in Ey-^f is labelled with a feature; 

(Hi) if / and g are labels of edges originating from the same vertex in Ty, 
then f ^ g] 

(iv) all vertices in Vf have at least one outgoing edge; 

(?;) leaves are labelled with atomic values, non-leaf vertices have no label. 

□ 

When we restrict the formalism to disjunctive feature trees, i.e., corefer- 
encing is not allowed, the unification algorithms can be adapted straightfor- 
wardly. Let X and y be two feature vertices, {ui, . . . ,Um} the value vertices 
that are successors of x and {ui, . . . ,Um} the value vertices that are succes- 
sors of y. When x and y have to be unified, a new set of m x n value vertices 
{wn , . . . ,Wmn} is created, where Wij merges the features of U{ and Vj. If Ui 
and Vj appear to be inconsistent, then Wij can be discarded. Only if all w 
are inconsistent, the unification of x and y is inconsistent. 

Extension of disjunctive feature graphs to a domain of multi-rooted dis- 
junctive feature graphs MVTQ is straightforward. 

When coreferencing is allowed, one has to take care that disjunction and 
coreferencing do not interfere with each other. This can always be avoided by 
pushing all disjunctions outwards, until we have a disjunction over nondis- 
junctive feature structures. In a more subtle approach we could allow coref- 
erencing and disjunction within a feature structure as long as certain restric- 
tions are fulfilled. 
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Definition 9.3. {safe disjunction) 

A vertex v in a disjunctive feature graph is called circumventihle if it has an 
ancestor and a descendant such that there is a path from the ancestor to the 
descendant that does not pass through v. 

A disjunctive feature graph is called safe when every circumventihle feature 
vertex has exactly one successor. □ 

A unification algorithm for disjunctive feature graphs is safe if it makes sure 
that no unsafe feature graphs are created. A variety of unification algorithms 
for disjunctive feature graphs has been published, and we will not further 
pursue this matter here. 

Kasper [1987a] has proven that unification of disjunctive feature struc- 
tures is A/*'P-complete. But worst cases do not apply in ordinary grammars. 
Kasper [1987a, b], Eisele and D5rre [1988], and Dorre and Eisele [1990] have 
come up with algorithms that perform well in the average case. Some recent 
studies devoted to various kinds of disjunctive feature structure unification 
are given by Maxwell and Kaplan [1989], Carter [1990], Hegner [1991], Nakano 
[1991]. A book with several other articles on this subject is edited by Trost 
[1993]. 

Veronis [1992] has presented a mathematical framework for disjunctive 
feature structures based on hypergraphs, rather than bipartite graphs. 



9.5 Restriction 

In general, many different decorations can be recognized for a single context- 
free item. There are two general methods to reduce the number of decorations 
in a chart parser for unification grammars. 

Firstly, we can apply the notion of subsumption. When different deco- 
rations ^pi{i) and ^ 2 {i) are recognized for some item t, and it holds that 
E then we only need to retain {i^ipi{i)) on the chart and we can 
delete (6, We have assumed that only such unification grammars Q are 

used for which the parsing system UG(0) is guaranteed to be finite. Hence, 
by applying this subsumption criterion, a finite set of recognized decorated 
items can be replace by a smaller set. 

A more fundamental problem, is the possibly infinite set of decorations 
that can be produced by adding top-down passing of features in a parsing 
schema. We will discuss this problem in detail and present restrictors as 
introduced by Shieber [1985a],® to guarantee finiteness of the Earley schema 
for unification grammars. 

® It is important to note that we use the terminology and notation of [Shieber, 
1985a], not that of [Shieber, 1992]. Restriction, denoted f, is replaced in the latter 
source by the a restriction function q . Moreover, the restriction symbol \ is used 
there for a different purpose, viz., restriction of top-level features (and dependent 
substructures) by narrowing the domain from which these axe drawn. 
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A restrictor is a kind of filter that can be used to remove irrelevant fea- 
tures from a feature structure. It is not necessary to define restrictors for a 
particular grammar “by hand”; in 9.6 it is shown how default restrictors can 
be defined as a function of the grammar. A different use of restrictors is dis- 
cussed in 9.7, where only a restricted set of features is taken into account in 
the first pass of a parser and secondary features are added in a second pass. 
But before we introduce restrictors we will motivate their need by means of 
an example. 

We will look at an example of a grammar for which the Earley schema 
produces an infinite number of items. Subcategorization of verbs can be en- 
coded in feature structures by giving a list of complements that a verb should 
have. The verb “catches” has two complements (subject and direct object), 
which can be expressed in a lexicon entry as in Figure 9.13. A verb that takes 
also an indirect object will have a complement list of three NPs. Other verbs 
could take a PP as complement. 



catches 



r cat 



head 



VP 

tense: present 

m r 

agr : 



number: singular 
person : third 



trans : 



subcat : 



pred: catch 

0 

ar,l: [] 

a 

arg2: [] 

r cat : NP 



first: 



rest : 



head : 



first : 



agr 



trans : 



a 

a 



r cat : NP 




head : 


ai 






trans : 




end 







Fig. 9.13. Lexical entry for “catches” with subcategorisation list 



When subcategorization is deferred to the lexicon, the grammar could 
have a production like 
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VPi-^VP2NP 

( VPi head) = { VP2 head) 

{ VP I suhcat first) = ( VP2 subcat first) 

{ VPi subcat rest) = ( VP2 subcat rest rest) 

(NP) = { VP2 subcat rest first) 

The VPs are indexed to distinguish them from each other. The complement 
list in the subcat feature of VP\ is one shorter than the corresponding list of 
VP 2- That means (when applied to the verb “catches” ) that a transitive verb 
combined with a direct object yields a structure that has the subcategory 
of an intransitive verb. The first complement slot, which is reserved for the 
subject of the verb, is not affected. But all post-complements of the verb 
can be swallowed in this way, until a VP is left with only one (subject) 
complement. 



[S-^NP. VP, 0,2] 



cat : 5 

□ 

head: 



‘ cat 
head 



VP 

B 



[] 



VP 






■ cat : NP 








subcat : 


first: 


head: 


agr : 
trans : 










_ rest : end 






_ 



Fig. 9.14. The subject has been recognized 



In the Earley schema for unification grammars, a problem occurs when 
we are to predict a VP. Suppose that we have a recognized item [S-^NP* VP, 
0,2] as in Figure 9.14. We can predict an item [VP-^»VP NP, 2 , 2 ], shown 
in Figure 9.15. Now we can predict another item [ VP-^* VP NP, 2, 2] with a 
different feature structure, shown in Figure 9.16. We can continue along this 
line, predicting new VPs with ever more complements. 

There is no theoretical reason why such problems should occur in top- 
down prediction and not in bottom-up parsing. One can construct unifica- 
tion grammars that cause a parser to loop infinitely in either direction. But, 
from a practical point of view, it is reasonable to expect that a unification 
grammar, using the schema UG, will yield only a finite number of different 
constituents for any sentence. It less reasonable to expect the grammar writer 
to take into account sophisticated parsing techniques, such as top-down pre- 
diction in order to reduce the amount of recognized constituents that do 
not contribute to a parse of the sentence. Therefore it makes sense to state 
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[VPi-^.VP2NP,2,2] 



VPi 



VPo 



cat : VP 

m 



head : 



first : 



subcat : 

I 

rest : 
cat ; VP 

0 M 
head : [ J 



0 



subcat : 



first : 



rest : 



0 



cat : NP 
head: 



agr : 
trans : 



0 ■ 

first : 

0 

rest : end 



0 



NP H-4 [cat: NP] 

Fig. 9.15. A VP predicted from Figure 9.14 



that preventing infinite loops in bottom-up parsing is the responsibility of 
the grammar, whereas preventing infinite loops in top-down prediction is the 
responsibility of the parser. 

A general solution to the above problem, due to Shieber [1985a], is called 
restriction. The basic idea is quite simple. When an item is predicted, only a 
relevant subset of the features is used. Irrelevant features, or sub-features 
beyond a certain depth are simply deleted. In the case of the subcate- 
gorization list, for example, we could decide that {VP subcat first) and 
( VP subcat rest first) are relevant features, while ( VP subcat rest rest) is not 
relevant. When the irrelevant tail of the subcategorization list is stripped off, 
the items in Figures 9.15 and 9.16 become identical, and no more different 
items [ VP-^* VP NP, 2, 2] can be predicted. 

Restriction of features in predicted items might, in general, lead to recog- 
nition of “useless” items that are incompatible with the features that have 
been deleted. But, much more importantly, it will prevent an infinite sequence 
of predictions. When the features in predicted items are restricted to a finite 
domain, it follows immediately that only a finite number of items can be 
predicted. 
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[VPi^.VP2NP,2,2] 



VPi 



VP 2 



NP 



cat : VP 

□ 



head 
subcat : 



first: 



rest : 



0 

0 



cat : VP 

0 



head 



[] 



subcat : 



first: 



rest : 



0 



cat : NP 
head: 



agr : 
trans : 



first: 



rest : 



0 

0 



first : [ cat : iVP] 
rest : end 



0 



[cat: NP] 



Fig. 9.16. A VP predicted from Figure 9.15 



Further elaborations of the use of restriction are given by Gerdeman 
[1989], Bouma [1991], Nakazawa [1991] and Harrison and Ellison [1992]. Haas 
[1989] presents a general Earley-like parsing algorithm for depth-bounded uni- 
fication grammars. A grammar is depth-bounded if all parse trees for all 
sentences have a finite depth. A simple unification grammar with subcatego- 
rization as in the above example is depth-bounded, because every verb has 
a finite number of complements. The user is referred to the cited papers for 
further details. We will only incorporate Shieber’s general solution into our 
parsing schema. 

A restrictor is a feature structure that contains no constants and no coref- 
erences. One could see it - in graph notation - as feature tree where the leaves 
carry no labels, or - in constraint notation - as a set of feature paths. We 
will use the AVM notation also for restrictors. The only difference in notation 
is that we may delete the [ ] symbols to indicate that a feature has no value; 
any feature without sub-features has no value by definition. 

The idea, then, is the following. When a feature structure is restricted by 
some restrictor, only those features remain that are explicitly mentioned in 
the restrictor. The constant values of the features allowed by the restrictor 
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are not prescribed and can vary according to the circumstances. In figure 9.17 
a suitable restrictor is shown for a VP for a grammar with sub categorization 
by means of a complement list. 



cat : 
head : 



^{VP) 



subcat : 



first: 



head . 



agr: 



number: 
person : 



rest : first : ^ cat: j 



Fig. 9.17. A suitable VP restrictor 



The agreement of the subject is retained by the restrictor, because this 
is precisely what prediction is being used for. The trans feature of the sub- 
ject can be disposed of, as it has no relevance to the recognition of a verb 
phrase. When subject and VP are combined using a production S^NP VP 
the translations of the NP and VP will be combined into a trans feature for 
5. 



We will now give a formal definition of restrictors and restriction. 
Definition 9.4. [restrictor) 

A restrictor is a constraint set that contains only existential constraints, i.e., 
constraints of the form (Xtt) = [ ]. □ 

In a more practical notation, one could describe a restrictor as a set of paths, 
rather than a set of constraints. But by defining a restrictor as a (special kind 
of) constraint set, closure, normal form and constraint graphs follow auto- 
matically from Section 8.1. Composite restrictors can be defined in similar 
fashion. We will only use restrictors with a single parameter, however. We 
write ^[X) for a restrictor^, whether it is a constraint set, a feature graph 
or a feature structure in general. 

Next we define restriction^ i.e., the application of a restrictor to a feature 
structure. We will define it in the constraint set domain, but it extends to the 
feature graph domain as usual. Informally, applying a restrictor means that 
those features that occur in the restrictor remain, with their constant values. 
Formally, this is defined as follows. 

Definition 9.5. [restriction) 

Let x{^) be a constraint set and ^[X) a restrictor. The restriction of x(X) 
by ^[X) is the set x‘[X) C x(^) ^bat satisfies the following conditions: 

^ Shieber [1985a] used ^ to denote a restrictor, but we must use another symbol 
because in this chapter, denotes the domain of feature structures. 
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( 2 ) if (Xtt) = fji e dosure{x' {X)) then {Xtt) = [ ] 6 closure {^{X)); 

{ii) if x”{X) C x{X) satisfies ( 2 ) then x”{X) C x\X). 

It is easy to verify that x^(^) is uniquely determined. 

We write x{X) \^{X) for the restriction of x(^) by ^{X)- □ 

It is important to note the difference between the restriction operator \ 
and the gib operator fl. If we have, for example, 



^{X) 



number: singular 
person : third 



m) 



number: 




number : 



nl 



Then we obtain 

(f{X) = [number: singular] , 

(p{X)n^{X) = [ ]] • 



9.6 Default restrictors 

If we define a restrictor for each nonterminal B G N, we can change the 
predict rule of the Earley parsing schema to 

j^Pred ^ {[A-^a.BP,i,j]r, h j, j]^ 

I ipiO=^{Br,)\9{B) U ¥^o(B-47)} 

(where we assume that f has operator precedence over U). Hence we could 
extend a unification grammar to a structure 

Q = (G, (po, W, Cex) 

with a function that assigns a restrictor to every nonterminal. But this is 
not a satisfactory solution. One should not change the definition of a grammar 
only to allow certain efficient parsing techniques if this can also be obtained 
with grammars as in Definition 8.29. Hence we introduce the notion of a 
default restrictor that is uniquely determined by G and (po. 

The default restrictor for a nonterminal B can be defined informally as the 
set of features for B that is obtained by collecting all features for B from all 
productions in which it occurs as a right-hand side symbol. One can take all 
feature structures for B from productions A-^aB(3^ throw away coreference 
and constant values and then unify the remaining structures. (Note that this 
cannot lead to inconsistency; because of the absence of atomic values there 
can be neither value/value clashes nor feature/value clashes.) 

Formally, a default restrictor is defined as follows. 
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Definition 9.6. {default restrictor) 

Let Q = Cex) be a unification grammar. For each B E N a, 

default restrictor ^o{B) is defined as the (unique) restrictor that satisfies the 
following conditions: 

(z) for any production A-^aB0 E P it holds that 

(a) for any ^{B) that satisfies (i) it holds that %{B) C ^{B). 

It is left to the reader to verify that % is finite and uniquely defined. The 
default restrictor, hence, can be seen as a function : N x □ 

Thus, finally, we can write down a restrictive version of the Earley parsing 
schema. 

Schema 9.7. (Earley(R)) 

For an arbitrary unification grammar Q = (G, #, ipo, W, JCex) &UQ a parsing 
system 'S‘{lEaHey(R),H,DEaHey(R)) IS defined by 

^Earley(R) = I A—>aP £ P A 0 < i < j A 

(fio(A-^a0) Q (fiiO A 

Dinit ^ {P [5-^.^,0,0]f I (^(6 = ^o(5-^7)}, 

D^can = {[A-^a,ap,i,j]^,[a,j,j + l](^ f- [A-^aa,l3,i,j + \]^ 

I <^(0 = <^(»?) Uv5(oc)}, 

£)Compi ^ ^ [A^aB.I3,i,k]^ 

^Pred _ {[A-^a.Bfi,i,j]r^ h [B .-f , j , j]^ 

I V’CO = ^{B^) \mB) U ^o(R^7)}, 

DEarley(R) = U U U 

and H as in (8.1). □ 

Theorem 9.8. {halting of Earley(R)) 

For any unification grammar Q eUQ and any string . . .a„ G IF it holds 
that 



if V(UG(0)(ai . . . a„)) is finite, 

then also V(Earley(R)(^)(aj^ . . .a^)) is finite. 

Proof: straightforward. □ 
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9.7 Two-pass parsing 

So far we have assumed that a parse for a unification grammar is constructed 
by a parsing schema that employs decorated items. This can be called one- 
pass parsing, because the parse trees and their decorations (of which only 
relevant parts are represented) are constructed simultaneously. As an alter- 
native, one could apply two-pass parsing to unfication grammars, as follows: 

• in the first pass a forest of context-free parse trees is constructed; 

• in the second pass these parse trees are decorated; 
trees with an inconsistent decoration are discarded. 

One could refine the two-pass scheme into an arbitrary number of passes, 
where each one adds some more detail to the end-product of the previous 
pass. One finds parsers for programming languages that have four or more 
passes. Details of such implementations are of no importance here, but the 
distinction between one-pass and two-pass is a fundamental one in our general 
framework. 

In a two-pass parser, the first pass actually contains two phases. In the 
first phase a set of items is recognized (based on some context-free parsing 
schema) as usual. In the second phase of pass one, the recognized items that 
do not contribute to a parse are located and discarded. How much items 
remain depends on the grammar, the parsing schema and the sentence, but 
typically only a small percentage remains. 

While it is true that some valid context-free items are not recognized by 
a one-pass parser (due to inconsistent decorations), two pass-parsing seems 
to be rather more efficient than one-pass parsing. Unification is a rather ex- 
pensive operation, and by two-pass parsing a number of irrelevant unification 
can be avoided. 

The above considerations are as vague as they are general, because much 
depends on the nature of the unification grammar. We have assumed, for the 
sake of simplicity, that there is some context-free backbone to the grammar. 
It is within the limits of the formalism, however, to construct a grammar 
with a context-free backbone 

N = {X}, X = P = {X-^X, X-^XX}, S = X 

and leave the traditional lexical category to some particular categorization 
feature. This is in fact the way in which a unification grammar without a 
context-free backbone is to be emulated in our framework. It is clear that 
two-pass parsing does not make sense for such a grammar. 

In a more subtle approach we do not need to make a binary choice between 
one-pass parsing and two-pass parsing. An intermediate form can, in general 
terms, be described as follows: 
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• in the first pass, only some primary features are used, the remaining sec- 
ondary features are disregarded; 

• in a second pass, the full decoration of the remaining items is obtained. 

A formalism in which such an intermediate parser can be described has been 
introduced already in 9.5. We can describe the primary features of each non- 
terminal A by a restrictor ^{A). All feature structures in the first pass are 
trimmed by a restrictor, both in bottom-up and top-down direction. It is 
important to remark that restricted features constitute a finite domain. That 
is, a context-free backbone enhanced with primary features is a context-free 
grammar® and thus constitutes a larger context-free backbone for (essen- 
tially) the same grammar. 

After the first pass, all recognized items that do not contribute to a parse 
can be discarded. The secondary features, subsequently, are added only to 
the remaining items. 

A specification of an intermediate parser can be given by means of a 
parsing schema and an additional restrictor function ^ that defines 

the primary features. For the implementation of such a parser it might be 
advantageous to compile the context-free backbone with primary features 
into a larger context-free grammar. This can be done mechanically. 

Nagata [1992] reports on an experiment with a parser for Japanese, where 
the original “course-grained” unification grammar (i.e., a grammar with few 
context-free productions) was turned into a medium-grained grammar by 
writing out the verb sub categorizations in the context-free backbone. He 
obtained the following results for a representative set of Japanese sentences. 



rule granularity 


course 


medium 


medium 


number of passes 


one 


one 


two 


average runtime 


30.2 sec 


17.8 sec 


8.7 sec 


relative speed 


1.0 


1.7 


3.5 



Maxwell and Kaplan [1993] did similar experiments with a (LFG) grammar 
for English and come up with similar results. 

While it is only natural that enlarging the context-free backbone is done 
by hand for first experiments, this technique can be described at a very 
high level in parsing schemata with the use of restrictors. An implementation 
that compiles a mixed parser for a given unification grammar and restriction 
function would be a very useful tool for investigating which features should 
be primary in order to obtain an efficient parser. 



® One can obtain a context-free grammar from a unification grammar with a finite 
feature domain by treating each possible feature structure as a separate grammar 
symbol and writing out the productions for all (finite) cases accordingly 
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9.8 Conclusion 

This chapter did not present new results (with exception of the notion of a 
default restrictor in Section 9.6) but reviewed several issues of importance 
for the procedural aspects of unification grammar parsers. 

Most important for the over-all subject of this book, viz., pgirsing of 
context-free backbones of grammars, is Section 9.7. Some experiments with 
restricted one-pass parsers have been carried out independently for a Japanese 
and an English unification grammar. Both were equally encouraging. These 
experiments were conducted by rewriting (by hand) the unification grammar 
such that some important features were taken into the context-free backbone. 
The framework that is described here allows to specify which features are pri- 
mary and which features are secondary at the level of a parsing schema. 

The trend in unification grammars has been to encode more and more in- 
formation into the lexicon and less and less in the context-free rewrite rules. 
With context-free backbones dwindling away, context-free parsing techniques 
seemed to be less and less relevant for unification grammars. The experiments 
of Nagata and Maxwell and Kaplan have indicated that, while highly lexi- 
calized grammars with only a few productions are useful for specification 
purposes, an efficient implementation of a parser for such a grammar makes 
use of a larger context-free backbone defined by primary features. The impact 
of this conclusion is threefold: 

• an interesting research issue is how to determine an optimal set of primary 
features; 

• there is a need for unification grammar parser generators that take a pars- 
ing schema, grammar, and a restriction function as input and generate a 
two-pass parser for the augmented context-free backbone; 

• context-free parsing, which seemed to lose much of its relevance for natural 
language parsing, is fully back on stage. 
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In Chapters 10 and 11 we apply the notion of parsing schemata to define 
Left- Corner and Head- Corner chart parsers. These two chapters can be read 
as a separate paper. From the theory that has been developed in Part II, we 
will use the notation, and the general idea of what a parsing schema is, but 
not much of the underlying theory. 

Chart parsers can be seen as rather straightforward implementations of 
parsing schemata.^ In Chapters 12-14 we will see other, more involved im- 
plementations of some simple parsing schemata; here we will develop rather 
complicated parsing schemata and do not worry a lot about implementation. 
We will briefly recapitulate the general notion of a chart parser and then 
present schemata, rather than parsing algorithms - leaving it to the reader to 
work out the appropriate details necessary to construct a full-fledged parser. 

Chapters 10 and 11 are based on joint work with Rieks op den Akker. 
Parts of it have been published in [Sikkel and op den Akker, 1993, 1996], some 
more details can be found in the technical report [Sikkel and op den Akker, 
1992]. New in these chapters is the embedding in the general framework of 
parsing schemata. The most substantial extensions to the cited material are 
the definition of a Head-Corner parser for unfication grammars in 11.8 and a 
detailed complexity analysis of the simplified context-free Head- Corner parser 
in 11.6. 

In Chapter 11 we will discuss Head-Corner parsing. The idea is to do the 
most important words first, and fill in the gaps later. The parser is rather 
complicated, due to the non-sequential way in which a string is processed. 
The easiest way to understand and formally define a Head- Corner parser is to 
see it as a generalization of a Left-Corner parser. This chapter, therefore, can 
be seen as an introduction to Chapter 11. It should be remarked, however, 
that Left-Corner parsers are interesting in their own right, not just as a 
preliminary to the more complicated Head-Corner parsers. In Chapters 4 
and 6 we have given an LC parsing schema and shown that it is in fact a 
filtered (i.e., more efficient) version of the Earley schema. A disadvantage was 
that the description of the LC schema was rather more complicated, there 

^ Historically one should see this the other way round, of course. Parsing schemata 
were invented as a rather straightforward abstraction of chart parsers. 
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is more variety in the types of deduction steps. The LC schemata that will 
be defined here are in fact easier to read; we will make a somewhat more 
liberal use of items and introduce auxiliary items that do not fit exactly in 
the theory of Part II (but the theory could be expanded straightforwardly). 

The reader who thumbs through this chapter might easily be put off by 
the seemingly overwhelming amount of formulae. We would like to stress, 
however, that most of these can be skipped without losing track of the dis- 
cussion. The emphasis is on the intuition behind the schemata. From the 
informal discussion and examples, one should be able to get fairly good idea 
of what is going on. The formal details, then, only serve to lay down precisely 
what has been stated already informally. Most of the mathematics is covered 
in separate sections (10.3 and 10.5) that can be skipped entirely by the less 
mathematically inclined reader. 

A brief, informal introduction to chart parsing is given in Section 10.1. 
We define a Left-Corner parser in 10.2 and prove it to be correct in 10.3. The 
items that are used by the Left-Corner parser can be simplified, at the cost 
of slightly more complicated deduction steps. This is dealt with in 10.4. In 
Section 10.5, the relation between the two parsing schemata given here and 
the LC schema of Chapter 4 is studied, making use of the parsing schemata 
transformations defined in Chapters 5 and 6. Conclusions are summarized in 
10 . 6 . 



10.1 Chart parsers 

The notion of a chart parser was introduced by Martin Kay [1980]. The pre- 
sentation of chart parsers that is given here is somewhat unconventional, 
because we start from the notion of a parsing schema. For a conventional 
description of chart parsing, see, e.g., Winograd [1983]. We will first recapit- 
ulate some important concepts of part II and then introduce the Earley chart 
parser. As a running example, we use the same sentence and grammar Gi, 
again, that has been used for illustration in previous chapters as well. 

The notational conventions for context-free grammars that were intro- 
duced in Section 3.1 apply throughout this chapter and the next one. We write 
A, . . . for nonterminal symbols; a, 6, . . . for terminal symbols; X, T, . . . for 
arbitrary symbols; a,/?... for arbitrary strings of symbols. Positions in the 
string a\ . . ,Gn are denoted by and Z, r. 

A parsing system for some grammar G and string ai , ..an is a triple 
P = (X,H,D) with I a set of items, H an initial set of items (also called 
hypotheses) and D a set of deduction steps that allow to derive new items 
from already known items. The hypotheses in H encode the sentence that is 
to be parsed. For a sentence ai . . . a„ we take 
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-ff = {[oi,0, 1], [a„,n-l,n]}. (10.1) 

It is not relevant whether H is contained in item set X or not; for the sake of 
brevity we may omit the hypotheses when we specify an item set X. Deduction 
steps in D are of the form 

The items ryi , . . . , t/a: € H UX are called the antecedents and the item ^ E X 
is called the consequent of a deduction step. If all antecedents of a deduction 
step are recognized by a parser, then the consequent should also be recog- 
nized. The set of valid items V(P) is the smallest subset of X that contains 
the consequents of those deduction steps that have only hypotheses and valid 
items as antecedents. 

A parsing system P is called instantiated if hypotheses for a particular 
sentence are included. An uninstantiated parsing system only defines X and D 
for a particular grammar G] H is a. formal parameter that can be instantiated 
to a set of hypotheses (10.1) for any given input string. A parsing schema is 
defined for a class of grammars. For any particular given grammar a schema 
instantiates to an uninstantiated parsing system. 

In order to define a parsing schema, one defines a parsing system for an 
arbitrary grammar G. As a typical example, consider the parsing schema Ear- 
ley (that was discussed more thoroughly in Example 4.32). For an arbitrary 
context-free grammar G we have a system "^EarUy = {^Earley, H,D Ear Uy) 
with 



^Earley = {[A-^a.p , i, j] | A-^a0 E P,0 < z < 

= {(-[5-4.7, 0,0]}, 

j)Scan _ [a,j,j -I- 1] I- [A-^aa.0,i,j -1-1]}, 

DCon.pl ^ {[A^a.Bl3,i,j],[B^l-J,k]\-[A^aB./3,i,k]}, 

DEarley = U U U 

and H to be instantiated for any input string by (10.1). Note that the initial 
deduction steps have no antecedent; these are valid for every sentence. The 
set of valid items for a string a\ an is 

Earley^ — I • • • aj A 

S=^*ai . . . aiA'y for some 7}, 



A parser is obtained from a parsing schema by adding data structures and 
control structures. A chart parser, in its general form, is a most rudimentary 
kind of parser. 
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A chart parser is equipped with two data structures, called chart and 
agenda. Both data structures contain items that have been recognized by 
the parser. The control structure, in its elementary form, is very simple. At 
each step an item - the current item - is taken from the agenda and moved 
to the chart. For each deduction step that has the current item as one of its 
antecedents, the chart is searched for the other antecedents. If all antecedents 
of a deduction step are on the chart, then the consequent of that step is added 
to the agenda (unless it is already contained in the chart or agenda). The 
initial chart contains the hypotheses, representing (the lexical categories of) 
the words of the sentence. The initial agenda contains all items that can be 
deduced by an antecedentless deduction step as the initialize above. The most 
general specification of a chart parser is presented in Figure 10.1. 



program chart parser 


begin 








create initial chart and agenda'^ 




while 


agenda is not empty 




do 


delete (arbitrarily chosen) current item from agenda; 
for each item that can be recognized by current 

in combination with other items in chart 
do if item is neither in chart nor in agenda 

then add item to agenda fi 


end. 


od 


od 



Fig. 10.1. General schema for a chart parser 



In this general set-up, every deduction step can be successfully applied 
only once. The antecedent that is the last one to be added to the chart will 
trigger recognition of the consequent. It is evident that all valid items - and 
only those - are added to the chart in due course. If there is a finite number 
of valid items^ then the agenda must become empty sometime and the chart 
parser finishes. 

The basic chart parser is nondeterministic, in the sense that a current 
item is selected randomly from the agenda. A deterministic chart parser is 
obtained by specifying how the next current item is to be selected. The agenda 
can be structured as a stack (last in, first out), a queue (first in, first out), or 
a priority queue (priority by a linear order on X). Sophistication in searching 



^ We only consider relevant items. There are parsing schemata for which an- 
tecedentless deduction steps deduce items for every possible sentence position. 
As the set of deduction steps - by definition - is independent of the sentence, 
such a schema yields an infinite number of valid initial items, in order to cope 
with sentences of arbitrary length. An item is relevant for a given sentence if 
positions markers contained in the item refer to positions that do not extend 
beyond the length of the sentence. Cf. Definition 4.33. 
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can be added by providing additional structure to the chart. See, e.g., Nijholt 
[1994] for various standard ways to structure a chart. 

As an example, consider the Earley chart parser. The initial chart contains 
H as in (10.1), the initial agenda is the set {[5-^.7, 0,0] | S-^y G P}. For 
each item that is taken from the agenda it must be checked whether a predict, 
scan or complete step can be applied. 

The canonical Earley chart parser, also called active chart parser, imposes 
some ordering on the agenda (but the parser is still nondeterministic; differ- 
ent items may have equal priority). An item [A-^a»P,i,j] has priority over 
an item if j < j' . The sentence is processed in left-to-right 

fashion: An item [A— >aa«/?, z, j -f 1] that has successfully scanned word j 
will remain on the agenda until all valid items with right position marker 
< j have been recognized and moved to the chart. Because of this ordering, 
some of the searches for fellow antecedents can be eliminated. If the cur- 
rent item is of the form [A-^a*B(3,i,j], one must predict items of the form 
A complete needs to be attempted only if there is an empty 
production B-^e. There is no need to look for items [B^y.,j, k] with j < k 
because these cannot be in the chart yet. Items of the form [A->a.a/3,z, j] 
and [A-^amBp,i, j] are called active items and look forward (to the right) 
for a match; items of the form [a,j - l,j] and [A-^a*,i,j] are called passive 
items and look backward (to the left) for a match. 

Grammar G\ is defined by the productions 

S-^NP VP, 

NP-¥*det *n, 

VP-¥*v NP. 

This grammar produces only one sentence: the lexical categories of our canon- 
ical example sentence “the cat catches a mouse.” It is on purpose that we 
choose a grammar that allows only a single parse tree. The intuition behind 
the various chart parsers that will be introduced here can be explained by 
visualizing how each parser steps through this single parse tree. 

Any reasonable grammar will allow different sentences and parse trees. 
A chart parser, then, will walk through all parse trees for the sentence and 
all partial parse trees for valid prefixes of that sentence. But all these tree 
walks are interlaced; from their general behaviour it is not at all obvious 
that the Earley, LC and HC chart parsers actually perform tree walks. If 
some specific tree is singled out, however, the items that relate only to that 
particular tree will follow some pattern that is characteristic for the chart 
parser under discussion. Hence we take an example in which only a single 
parse tree exists; in this way the salient features of our different chart parsers 
will stand out. 

It is not a general feature of chart parsers that they recognize all items 
for a given tree by making some walk through that tree. A CYK chart parser 
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clearly does not do that. That the Earley and LC parsers do perform a left- 
to-right walk through a parse tree is a consequence of the underlying design 
decision that the entire left context is taken into account for item recognition. 
In this way the work for a sequential parser is minimized, but possibilities 
for parallel processing greatly reduced. 

The final chart of the Earley chart parser for grammar G and the example 
sentence is shown in Figure 10.2. For each item it is indicated how it was 
added to the chart. In Figure 10.3 a top-down left-to-right walk through 
the parse tree is shown. We distinguish steps down from a nonterminal to 
a nonterminal, steps up from a nonterminal to a nonterminal, and terminal 
steps from a nonterminal down to a terminal and up again. 





item 


motivation 


(0 


*det,0, l] 


initial chart 


(u) 


*n,l,2] 


initial chart 


(m) 


*v,2,3] 


initial chart 


(iv) 
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Fig. 10.2. The final Earley 
chart 
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Fig. 10.3. The Earley 
tree walk 



A terminal step comprises two steps, in fact. It is counted as a single step 
so as to create a one-to-one correspondence between non-initial items on the 
chart and steps in the tree walk. A terminal step from A down to a and back 
to A corresponds to scanning an a in a production with left-hand side A; a 
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step up from B to A corresponds to a complete in which the dot is moved 
over a in a production with left-hand side A; a step down from A to B 
corresponds to predicting a production with left-hand side B. 



10.2 Left- Corner chart parsing 

We will define a chart parser that is based on a generalization of the Left- 
Corner (LC) algorithm known from the literature. 

Deterministic Left- Corner parsing^ has been introduced by Rosenkrantz 
and Lewis [1970]. An extensive treatise on LC parsing is given by op den 
Akker [1988]. First ideas of a generalized LC parser, although not under 
that name, can be traced back to Pratt [1975]. A left-corner style parser in 
Prolog was presented by Matsumoto et al. [1983]. Their BUP parser over- 
comes the general problem in Definite Clause Grammars that left-recursion 
cannot be handled. BUP is limited to acyclic, e-free grammars. As usual in 
Prolog implementations, ambiguities are handled by backtracking. A differ- 
ent way to handle ambiguities is by means of a graph-structured stack.^ A 
left-corner parser based on such a data structure is described by Nederhof 
[1993]. Our approach to LC parsing is chart-based. It is in fact quite similar 
to the directed bottom-up parser of Kay [1980]. 

We describe a (generalized) Left-Corner parsing algorithm in the form of 
a chart parser. The line of presentation is somewhat different from Chapter 
4, where a parsing schema LC was derived from the Earley schema. We 
will first concentrate on the intuition and describe the parser from a “left- 
corner” perspective. A derivation of this parser from the schemata in Part II 
is postponed to 10.5. 

^ In a deterministic parser, not more than a single action can be undertaken in 
any circumstances. One could think of a chart parser where there is never more 
than a single item on the agenda. A deterministic parser can parse a sentence 
in linear time, but in order to obtain determinism, the class of grammars that 
can be used has to be severely restricted. This is, in general, acceptable for 
programming languages but impossible for natural languages. A necessary (but 
not sufficient) condition for determinism is that the grammar be unambiguous. 

^ The term “Generalized LC” has been introduced by Demers [1977] for a rather 
different concept. He generalized the notion of Left Corner, deriving a framework 
that describes a class of parsers and associated grammars ranging from LL(A;) via 
LC(/c) to LR(A;). In the context of Natural Language parsing, the more obvious 
meaning of generalized LC parsing is that the grammar need not be LC(A;) for 
any k. Hence, the parser is nondeterministic; for a chart parser this does not 
cause problems. 

The semantic ambiguity of the noun phrase “Generalized LC parsing” duly re- 
flects the syntactic ambiguity: we are concerned with [Generalized [Left-Corner 
Pcirsing]], whereas Demers discussed [[Generalized Left-Corner] parsing]. 

^ Cf. Chapter 12 where a graph-structured stack for a generalized LR parser is 
discussed. 
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A Left-Corner parser, like an Earley parser, proceeds through the sentence 
from left to right. The type of items and the motivation behind the steps is 
different, however. An important difference is in the way in which top-down 
predictions are used to guide the bottom-up recognition. Predict steps in 
Earley’s algorithm are replaced by goals that the LC parser tries to satisfy 
in a purely bottom-up manner. Bottom-up recognition is guided towards the 
right goal by means of the left- corner relation. 

Definition 10.1. {transitive and reflexive left- corner relation) 

The left corner is the leftmost symbol in the right-hand side of a production. 
A-^Xa has left corner A; an empty production A-^e has left corner e. 

The relation >£ on N x {V U {e}) is defined by 

A>eU if there is a production A-^a € P which has left corner U. 

The transitive and reflexive closure of >i is denoted >*. □ 

For our trivial example grammar the transitive left-corner relation >* com- 
prises 

5 >1 5, 5 >*£ NP, S >*£ *det, 

NP NP, NP >1 *det, VP >; VP, VP *v. 

The LC chart parser uses the following kinds of items: 

[i,A] : predict items or goals, 

[A] B-^a*P,i,j]: left-corner {LC) items, 

[a,j — l,j] : terminal items as in the Earley chart parser. 

Recognition of items should be interpreted as follows. 

• A predict item [i,A] will be recognized if preceding items indicate that a 
constituent A should be looked for, starting at position i. 

• An LC item [A; B-^a*p,i,j] will be recognized if [i,A] is set as a goal, A 
could start with a B (i.e. A >* B) and a=>*ai_|_i . . .aj has been established. 
In other words, an LC item incorporates a prefix for a given goal. 

Parsing our sentence starts with a goal [0,5]. The first word is [*det, 0,1]. 
It is known by the parser that *det is a transitive left-corner of 5. We can 
“move up” one step from *det in the tree walk if we find a symbol A such 
that S >* A and A >e *det. In our case, this symbol is NP and the deduction 
step that applies here is 

[0,5], [^de^,0,l] h [S]NP^*deU*n,0,l]. 

The scan that includes the noun in the recognized part of the NP is similar 
to Earley’s: 

[5; NP-^ *detm *n, 0, 1], [n, 1, 2] h [5; NP-^ *det *n., 0, 2]. 
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Having recognized a complete NP, we can move up again to a left-hand side 
symbol that is nearer to 5. 

[S;NP-^*det*n., 0,2] h [S; S-^NP. VP, 0,2], 

In general it is not necessary that both S symbols refer to the same node in 
the parse tree. If the grammar would have a production S-^S PP, we might 
step up later from the left-hand side 5 to a mother node also labelled 5. 

We have now deduced an item with the dot preceding a nonterminal sym- 
bol. We carry out a predict step that is not so much different from Earley’s: 

[S]S-^NP.VP,0,2] h [2, VP], 

The LC parser continues in similar fashion. The final chart is shown in fig- 
ure 10.4 (the initial chart has been deleted for the sake of brevity). In the 
motivation column the names and antecedents of the deduction steps are 
listed. For left-corner steps we distinguish between terminal and nonterminal 
left corners (generically denoted by letters a and A). 
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Fig. 10.4. A completed LC chart (excluding terminal items) 




The corresponding left-corner tree walk is shown in Figure 10.5. Like the 
Earley tree walk, the parse tree is visited in top-down left-to-right order. The 
main difference is that steps down to left corners do not cause the recognition 
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of an item; these steps are encoded in the >* relation and do not need to 
be taken explicitly. Steps down to nonterminal daughters that are not a left 
corner correspond to setting a new goal. Terminal daughters are scanned in 
a single step. These steps are in fact identical to the terminal steps in the 
Earley tree walk. The layout in figure 10.5 has been adapted, however, to 
underline the bottom-up direction of item recognition. The idea is that 

• top-down arrows correspond to setting new goals, 

• bottom-up arrows correspond to recognizing LC items. 



We will define a parsing schema® that underlies the LC chart parser. The 
parsing schema is called pLC for predictive LC, because the identifier LC 
was already used in Chapter 4 for Example 4.36. 

Schema 10.2. (pLC) 

We define a parsing system Fpic for arbitrary context-free grammar G G 
CJ^Q. The domain XpLC is given by 



j-Pred 


= {[i,A]|A6iV 


A * > 0}, 


jLC{i) 


= {[A;J3^Xq./3, 


i,j] \AeN A A>*( B 






A B^Xa(3 € P A 0 < i < j}, 


j-LC{ii) 


= {[A; \ 


Ae N A A>*^ B A B-¥e £ P A j > 0} 


XpLC 


'^Pred ij j^LC{i) y j~LC{ii) 



It is important to remark that LC items [A\ B-^a»P,i,j] exist only for A and 
B such that A >* B. Deduction steps, by definition, can only deduce items 
in X. Hence, when we specify the various kinds of deduction steps, we need 
not state explicitly that items [A; 2 , j] may occur as a consequent 

only ii A >* B. This is enforced implicitly by the definition of the domain of 
items. 

For the set of deduction steps, we define subsets for initialize, scan and 
complete steps similar to the Earley schema. Predict steps set new goals as 
explained above. The left-comer steps come in three varieties, for terminal, 
nonterminal and empty left corners, generically denoted by a, A and e. The 
set D is defined by 

Dinit ^ 

DLC(a) ^ [o,i,i + l] h + 

® Parsing schemata is this chapter are more liberal than the parsing schemata 
defined in 4. Here we define types of items ad hoc, such that these suit our 
purposes, while in Chapter 4 items were defined as a subset of a partition of the 
set of trees for a given grammar In 10.5 we will argue that our liberal approach 
here is in fact an extension of the formal theory of Part II. 
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DLC(A) ^ f- [C;B-*A.0,i,j]}, 

= {[i,C] h 

DPred ^ {[C-,B-^a.A0,i,j] I- [j,A]}, 

DScau ^ {[C;B-^a.a0,i,il [a,j,j + l] H [C; B^aa.0,i,j + 1]}, 
DCompi ^ {[C;B-Aa.A0,i,j], [A; A-Ay.,j,k] h [C;B-^aA.0,i,k]}, 

DpLC = U U U U U U 

With the set of hypotheses as a formal parameter for the string to be 
parsed, we have fully specified the parsing system fpic = {'^pLC, H.DpLc)- 

□ 

A chart parser is obtained from pLC as follows. 

• The initial chart comprises the hypotheses for the given sentence; 

• the initial agenda contains the consequent of the (only) initialize deduction 
step. 



10.3 Correctness of the LC chart parser 

The chart parser based on pLC is correct if, for an arbitrary grammar G and 
any string ai . . . Un, it holds that 

[5; 5->7., 0, n] e V(PpLc) if and only if 5 => 7 ai . . . Un- 

(cf. Definition 4.22). 

Unlike for the Earley chart parser, however, it not trivial to determine the 
set of valid items V(PpLc)- We will proceed as follows. First a set of viable 
items is postulated, i.e., items that ought to be recognized by the parser. 
Subsequently, we will prove that V(Ppz,c) contains all viable items and no 
other items. 

Definition 10.3. {(pLC-)viable items) 

We define pLC-viability (or shortly viability) for each type of item. 

• Let Q denote the set of viable predict items, fi is the smallest set satisfying 
the following conditions: 
o [0, S] E i?; 

o if there are A, C, X, a, i , j such that 
(0 [2,A]Ef2, 

(ii) A >* 5, 

{iii)B=>XaC0, 

(iv) Xa=>*Gi-^i . . ,Gj 
then [j, C] E i?. 
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• A left-corner item [A; is viable if 

{i) [i, A] is viable, 

(n) A >* B, 

(Hi) B=>aj3, and 
(iv) a=^*ai+i . . . ttj. 

• A terminal item [a, j — 1, j] is viable if a = Oj. □ 

Note that, by definition, items [A; B-^a*/3yi,j] come in two variants: either 
a^eora — 0 = e. Both cases are covered in the above definition; it should 
be clear that X does not contain items [A; j] with P ^ e. 

It is possible to give a direct characterization of viable predict items that 
is equivalent to the inductive specification in the above definition. 

Lemma 10.4. 

Let i? be as in Definition 10.3 and Q' defined by 

n' = {[0,5]}U{[2,A]|3fc,B,X,a,/3,7: S=>*ai . . .akB-f A 

B^XaAP A 
Xa^*Gk-^i . , ,ai }. 

Then ft' — fl. 

Proof. The proof makes use of the “walk length function” w that will be 
defined in the proof of Lemma 10.7. Therefore it is postponed to page 215. 

□ 

From Definition 10.3 it follows immediately how the grammatical correct- 
ness of a string can be expressed by means of viable LC items. 

Corollary 10.5. 

An item [5; 5->7., 0,n] is pLC-viable for a string ai . . . Un if and only if 

S=>j=>*ai ... On- □ 

In order to establish the correctness of the LC parser it remains to be proven 
that viability and validity are equivalent properties in pLC. 

Lemma 10.6. Any item contained in V(PpLc) is is pLC-viable 
(i.e., the LC chart parser is sound). 

Proof This follows straightforwardly from the following observations: 

• all initial items are viable; 

• for each deduction step in D it holds that viability of the consequent is 

implied by the viability of the antecedents. □ 

Lemma 10.7. 

All pLC- viable items are contained in V(PpLc) 

(i.e., the LC chart parser is complete). 

Proof We will first explain the general idea, which is quite simple, before we 
spell out the somewhat cumbersome details. 
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A generic method to prove the completeness of a parsing schema (and 
hence a chart parser) is the following. To each viable item ^ a number /(^) 
is assigned, that has some relation to the minimum number of steps needed 
for recognizing If we are able to establish for each viable item ^ 

there is a deduction step , • . . , t/A: ^ such that rji, . . .rjk are viable and, 

moreover, 

no > f{rh) forl<i<k, (10.2) 

then it follows by induction on the value of / that all viable items are valid. 
The key problem is to pick the right function /. 

For our LC chart parser we define a function w that corresponds to the 
(length of the top-down left-to-right) walk through a (partial) parse tree that 
is needed to derive the item. For the tree walk we count all edge traversals; 
the dotted lines in Figure 10.5 as well as the arrows. The definition of w 
makes use of the following parameters: 

• 7t: the size of the tree walk for the relevant predict item; 

• A: the number of edges traversed in top-down direction by the >* relation; 

• /i: length of a derivation . . .aj for items [A; B-^Xa*l3,i,j]. 

Furthermore, we have to take into account that in general different (partial) 
parse trees may exist that account for the same item. Hence we have to take 
the minimum number of steps in such a walk in an arbitrary tree. 

The partial function w : XpLC N is defined by 

• tt;([0,5]) = 0, 

• w{[j,C]) = min{7r-h(A+l)+2/i I 3A,B,X,a,PJJ : 

[A] B-^Xa.C0,i,j] is viable A 
7T = w{[i, A]) A 
A>^ B ^ 

XoL^^ai^i , . .aj } 

• w{[A]B-¥a*l3^i^j]) = min{7r -h A + 2/x | tt = ty([z, A]) A 

A>^ B f\ 

. . .aj } 

• w{[j, C]) and w{[A] B—>a*P, i,j] are undefined if the conditions in the pre- 
ceding two cases cannot be satisfied (i.e., the minimum is taken over an 
empty set). 

We count 2//, as each edge of the derivation tree is traversed twice. For predict 
items [j, C] we count A + 1 because A edges are skipped by A >* B and an 
additional edge is moved down from B to C. 

In order to finish the proof we have to establish 

(z) w[^) is defined for every viable item 
(n) condition (10.2) holds for each viable 
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As to the first point, it is easy to verify that for each viable item there are at 
least one tt, A, /i for which the conditions are fulfilled, hence, (by induction 
on the definition of viability) w is defined for all viable items. 

Thus it remains to be shown for each viable item ^ that there is a deduction 
step 7/1, . . . ,r/fc h ^ such that all rji are viable and have a lower u;-value than 
We will spell it out as an exemplary case; in subsequent proofs this part 
will be omitted. We distinguish between 

• predict-items (a); 

• different types of LC items: 

o LC items with the dot in leftmost position (6); 
o LC with a single symbol preceding the dot: 

• a terminal symbol preceding the dot (c), 

• a nonterminal symbol preceding the dot (d); 

o LC items with two or more symbols preceding the dot: 

• a terminal symbol immediately preceding the dot (e), 

• a nonterminal symbol immediately preceding the dot (/). 

For viable items of each type we will give a deduction step with viable an- 
tecedents and show that the condition on it;- values is satisfied. 

(а) Let ^ = [j,C]. 

From the viability of ^ we obtain that there are A, 5, A, a, /?, i, tt. A, /z 
such that 

(z) [z, A] is viable, 

(ii) s([z,A]) = 7t, 

(m) A jB, 

{iv) B^XaCfi, 

{v) 

\vi) w{^) = TT -f (A -f 1) -f 2/i. 

From (z)-(v) it follows that r] = [A; B->Xa*C0\ is viable and 7/ h 
Moreover, w{rj) = tt -f A + 2/x = u;(^) — 1. 

(б) Let ^ = [A;B“>.,z,z] be viable. 

^ can only be recognized by [i, A] h where A >* B. 

Moreover, it;(^) = w{[i. A]) + A -h 2 with minimal A such that A B. 

(c) Let ( = [A; B-^a*/3, z, z H- 1] be viable. 

^ can only be recognized by [z. A] h where A >* a. 

Moreover, w{^) = u;([z, A]) + A -h 2 with minimal A such that A a. 

(d) Let ^ = [A;B-^C«/3,z, j] be viable. 

There must be some viable t] = [A; C— > 7 ., z, j] such that [z. A], 7 / h 
Let A>\ B and . . .aj for minimal A and /x, 

then A C and 7=4>^~^ai+i . . . a^. 

Hence, w{^) = iy([z. A]) -j- A -h 2/x) > w{[i, A]). 

Moreover, w{r}) = w[i, A] + (A 4- 1) -f- 2(/z - 1) = w{^) - 1. 
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(e) Let ^ = [A\B-^Xaa»P^i^j] be viable. 

Then r} = [A\ B->Xa»al3, i,j — 1] is viable and 77, [j — 1, a, j] h 
Clearly, w\r]) = w(() - 2. 

(/) Let ^ = [A; B—>‘XaC»/^, z, k] be viable. 

Then it must hold that 
(z) [i, A] is viable. 

Furthermore, there are A,p, q such that 
{ii) A>^ B , 

(Hi) Xa=>^ai^i . . .aj, 

(iv) C=^7=^’“^aj+i . . .Cfc, 

(v) w(0 =w(li,A])+A + 2(p + g). 

Prom {i)-{iii) it follows that t} = [A] B-^Xa.C0,i,j] is viable and [j,C] 
is viable. 

With (iv) we obtain that C = is viable. 

Furthermore, 77, C I" ^ ^ -C) and if follows that w(rf) = w(^) — q < w(^); 
w{C) =w{[j,C]) + 2(q-l) < n;([f,^]) + (A+l) + 2(p + g'-l) = iy(0 - 1. 

Hence we may conclude, by simultaneous induction on the z/;-value for all 
types of items, that pLC- viable items are contained in V(FpLc)- D 

Theorem 10.8. (correctness of the pLC chart parser) 

For any grammar G G CXQ and string ai . . . On it holds that 

[5; 5-^7., 0, n] G V(FpLc) if and only if S'=>7=^*ai . . . Un- 

Proof : directly from Lemmata 10.6 and 10.7 and Corollary 10.5 □ 

It has been left to prove that the equality Q — ft' holds for ft ^ ft' as defined 
in Definition 10.3 and Lemma 10.4. In that proof we make use of the tree 
walk function that has been defined in the proof of Lemma 10.7 (but, in order 
to avoid circularity, none of the results established after Lemma 10.4 should 
be used). 

Proof of Lemma 10.4. 

(i) i? C i?' is proven by induction on on w([i^A]). 

Let [j, C] E fihe viable and predicted by [A; B->Xa*CP, i,j]. Then from 
w([i,A]) < w([j,C]) we may assume [i,A\ G i?' and it follows trivially 
that [i, C] G 

(ii) ft D ft' is obtained as follows. 

Let [z, A\ G ft' ^ 5=>*ai . . . ahBj, B=>XaA0, Xa=>*aib_f.i • • • 

In the derivation S=>*ai ... anBj, we must identify the most direct an- 
cestor of B (or possibly B itself) which is not a left corner. Let’s call 
this JD. If B is not a left corner, then D is B, Otherwise, B has been 
produced by some E-^a'Bf3' . If a' ^ £, then D = E, otherwise E will 
have been produced by some F-^a"El3", and so on. 

D has been produced by some C—>YSDj'^ hence there is a derivation 
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S=^* 5' =>5'Y SDy'y”=^*ai . . . ahDy' y" a\ . . . ahBy. 

Clearly, [h,D] E D >* B, B^XaA/3, Xa=>*ah-^i . . . a{, hence 
[i,A]en. □ 

10.4 An LC chart parser with simplified items 

An LC item [A] B—^aml3^ i,j] can be seen as consisting of a predicted part [A, i] 
and a recognized part [B-^am/3^i, j]. The LC chart parser can be simplified 
somewhat by disconnecting these two parts. The predict parts correspond to 
predict items that are contained on the chart already; the recognized parts 
are in fact conventional Earley items. 

The reason for not introducing this simplification straight away is the 
relation between the LC chart parser and the HC chart parser that will 
be discussed in the next chapter. In the HC case there are good reasons 
for keeping the predicted and recognized parts within a single item, when 
unification grammars rather than context-free grammars are used. 

A simplified parsing schema for the LC chart parser, sLC, is derived from 
the pLC schema as follows. 

• LC items are replaced by Earley items, 

• The deduction steps are extended, where necessary, with extra antecedents 
and conditions. 

Schema 10.9. (sLC) 

We define a parsing system for ^tn arbitrary context-free grammar G G 
CTQ. The domain Xslc on deduction steps VsLC ^re given by 

= {[i,^]| A€iV A i>0}, 

jLC(i) ^ j] I B-^Xap eP A0<i<j}, 

jLC(») ^ {[B-^.,jJ] \B^e€P Aj> 0}, 

IsLC = U U 

^ I |_ [0,5]}, 

DLC(a) ^ [a,i,i + l] h [B-^a.l3,i,i + 1] \ C >* B}, 

^ \ C >*, B}, 

Dic(e) ^ I C>*tB}, 

DPred ^ {[B-^a,Cp,i,j] h [j,C]}, 

pScan _ [a,j,j + l] h [B-^aa.p,i,j + 1]}, 
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£>Compi ^ {[B^a.A0,i,j], [A-^-f.,i,k] h [B-^aA.0A^k]}, 

Dslc = U U U 

With the set of hypotheses H to be instantiated by (10.1) for any string, we 
have fully specified a parsing system Pslc == (^sLC^ H, Dslc) for an arbitrary 
grammar G G CTQ. □ 

The set of valid items V(Pslc) for any sentence ai . . .an is given by 

• [a,j — l,j] is valid iff a = aj 

• [i, A\ is valid if 

O [i,A] = [0,5], or 

o if there are A:, X, a, /?, 7 such that 

S=>*ai . . . akB'y, B^XaAp, and Xa=>*ak-\-i . . .ai. 

(cf. Lemma 10.4). 

• An Earley item [A— z, j] is valid if there is a 7 such that 

S^*ai . . . aiA'y and Xa=^*ai^i . . . Uj. 

Note, again, that this applies only to items in Islc, i*e., a^e 01 a — P — e. 

The correctness of the above characterization of V{Fslc) follows straightfor- 
wardly from Theorem 10.8 and the relation between sLC and pLC that will 
be established in the next section. 



10.5 The relation between pLC, sLC, and LC 

We will now compare the parsing schemata pLC and sLC with LC as defined 
in Example 4.36 and establish relations between these schemata as defined 
in Chapters 5 and 6. 

We have to differentiate between the schemata defined in Chapter 4, called 
basic schemata henceforth, and the more liberal parsing schemata that we 
introduced in this chapter. Items in the domain of a basic parsing schema, 
by definition, are the equivalence classes of a particular relation on the set of 
trees. Hence, the domain of a basic parsing schema is a subset of a partition 
of the set of trees. ^ This is not the case for the domains Xpic and Xslc^ 

Let us look at sLC first. The Earley items in sLC are identical to the 
items of LC as defined in Example 4.36 (with one exception: the special items 
[5->.7, 0, 0] are not used in sLC). The predict items, on the other hand should 
be regarded as equivalence classes of LC items. The meaning of recognizing a 
predict item [z. A] is to denote that some Earley item [D-^j.AS, /z, z] has been 
recognized. By making the item set more sophisticated we have decreased the 
number of deduction steps, notwithstanding the fact that we have increased 
the number of valid items. 

^ Note, however, that a schema may also contain items that denote the empty set; 
cf. Section 4.5 
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One could argue that the sLC chart parser is an implementation of the 
underlying basic parsing schema LC. By adding predict items to the chart 
parser we have created a data structure that stores the relevant properties 
of items to be used as possible antecedents. Hence, the sLC chart parser is 
an optimization of a chart parser directly based on LC, without this extra 
data structure. The tree walk of a chart parser based on LC is shown in 
Figure 10.6. Less items are recognized, but the higher search costs are not 
displayed in the figure. 




*d 



*d 



Fig. 10.6. A tree 
walk according to the 
schema LC 



In a similar way, the pLC chart parser can be seen as an extension of the 
sLC chart parser. LC items are annotated with the predict item that cause 
their recognition. This is a useful feature when the predicted symbol carries 
attributes that might rule out certain applications of left- corner deduction 
steps. For context-free parsing it only increases the number of items. Hence 
this is not an optimization. We have introduced pLC primarily as a step 
towards the definition of the schema pHC in Chapter 11. 

We can apply the relations between parsing schemata that were defined 
in Part II. The definitions of parsing schemata in this chapter are based on 
intuition and not formally derived from the theory in Part II. As a result, 
it roughly holds that sLC is a step refinement of LC and it roughly holds 
that pLC is a step refinement of sLC. “Roughly” means, here, that a few 
inessential details have to be swept under the rug. In order to get things 
fit exactly, we will define two auxiliary parsing schemata LC’ and pLC’ 
that differ from LC and pLC only in minute (and for practical purposes 
irrelevant) details. 

The schema LC’ differs from LC in the following respect: 

• All items of the form [S->'«7,0,0] in Xic are collapsed into a single item 
[0,5] in Xlc^\ antecedents in deduction steps are adapted accordingly. 

Then the (rather trivial) item contraction relation LC LC’ holds. 
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The schema pLC’ differs from pLC in the following respect: 

• left- corner {A) deduction steps [C; A— > 7 ., z, j] h in Dlc 

are replaced in Dlc’ by left- corner {K) deduction steps 

Then the relation pLC pLC’ holds, as a dynamic filter may add an- 

tecedents to deduction steps. This is a particularly degenerate case of dynamic 
filter (and not worth to coin a special name for) as the added antecedents 
don’t filter anything. An item [C; A- 47 ., i, j] cannot be recognized without 
having recognized [z, C] before. 

Having settled these details, we can now state the desired result. 

Theorem 10 . 10 . {relations between LC, sLC and pLC) 

The following step refinement and item refinement relations hold: 

LC’ ^ sLC ^ pLC\ 

Proof It is clear from the definitions that Xlc^ C Xslc and it follows straight- 
forwardly that \~ic^ ^ hence LC’ sLC. 

The item contraction function / : Xpic’-^Xsic is defined by 

It follows immediately that I^lc = fi^pLC’) and Agio = f{ApLC’), 
hence sLC ^ pLC’. □ 

We recall from Corollary 5.8 that item contraction (the inverse of item re- 
finement) is correctness preserving.® Hence, as we have proven pLC correct, 
the correctness of sLC follows. 

Informally we write ^ for the trivial relations that denote irrelevant syn- 
tactic differences between parsing schemata. Hence, informally, the results of 
this section can be summarized as 

LC « LC’ ^ sLC ^ pLC’ pLC . 



10.6 Conclusion 

The LC parsing is well-known, both in the Computer Science and Computa- 
tional Linguistics literature (cf. Rosenkrantz and Lewis [1970], Pratt [1975], 
Matsumoto [1983], op den Akker [1988], Resnik [1992], and Nederhof [1993]) 

® Note, however, that because of the more sophisticated items sLC and pLC do 
not belong to the class of semiregular parsing schemata to which Corollary 5.8 
applies. The extension with predict items and LC items is inessential, however, 
and a similar result for these parsing schemata can be derived from Definition 
5.5. 
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but it is not very common to describe an LC parser as a chart parser. By 
doing so, the very close relations between LC parsing and Earley chart pars- 
ing have been made explicit in a simple way (informally in Section 4.6 and 
more formally in Example 5.22). In this chapter we have given a somewhat 
more convenient description of an LC chart parser, making use of additional 
predict items. 

A chart parser is not necessarily the most efficient implementation of the 
LC algorithm, Nederhof [1993, 1994a, 1994b] has defined a generalized LC 
parser based on a graph-structured stack and discusses further optimizations. 
The advantage of describing the LC parser as a chart parser - other than a 
nice application of the framework developed in part II - is that it is a more 
general description. In the next chapter we will introduce parsing schemata 
pHC and sHC for Head-Corner parsers that are straightforward extensions 
of the schemata pLC and sLC. 
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“Our Latin teachers were apparently right” , Martin Kay [1989] remarks, “You 
should start [parsing] with the main verb. This will tell you what kinds of 
subjects and objects to look for and what cases they will be in. When you 
come to look for these, you should also start by trying to find the main word, 
because this will tell you most about what else to look for.” 

In this chapter we introduce and analyse a few parsing schemata for Head- 
Corner (HC) parsers, that implement the general idea of head-driven parsing 
as sketched by Kay in his usual lucid style. When it comes down to defining 
the details with mathematical rigor, it is indeed a lot of detail we get involved 
with. Looking at the important words first means jumping up and down the 
sentence. Keeping track of where you have been and where the next interest- 
ing word might be located requires a more sophisticated administration than 
simply working through the sentence from left to right. In order to under- 
stand what is going on, it is of great help to have grasped the ideas behind 
the LC parsers presented in Chapter 10. HC parsing can be seen as a gen- 
eralization of LC parsing - it is just a different corner we start with, all the 
rest is similar (but involves more bookkeeping). As in the previous chapter, 
the mathematical details of correctness proofs and complexity analysis are 
put into separate sections. These can be skipped without losing the thread 
of the discussion. 

Before we start to define Head-Corner parsers, we need to have some 
notion of a head. For this purpose we introduce context-free head grammars 
in Section 11.1. In 11.2 we introduce a predictive HC chart parsing schema 
pHC as a generalization of pLC. The correctness of pHC is proven in 11.3. 
This schema is the basis for two further developments. 

For Head-Corner parsing of context-free grammars, we develop a simpli- 
fied schema sHC in 11.4 (and prove this to be correct in 11.5). A detailed 
complexity analysis in 11.6 will show that, despite the increased sophistica- 
tion in administrative details, the schema can be implemented with a worst- 
case complexity that is as good as that of Graham, Harrison, and Ruzzo’s 
variant of the Earley parser - the optimal worst-case complexity for practical 
context-free parsers known today. The relation between pHC, sHC and the 
parsing schemata of Part II is established in 11.7. 
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In Section 11.8 we extend the schema pHC to parsing of unification 
grammars, using the notation that was developed (and motivated) in Chapter 
8. Related approaches are briefiy discussed in 11.9, conclusions follow in 11.10. 

Like Chapter 10, this chapter is based on cooperative work with Rieks op 
den Akker ([Sikkel and op den Akker, 1992, 1993, 1996]). Section 11.8 is based 
on a Head-Corner parser for unification grammars that has been defined and 
implemented by Margriet Verlinden [1993]. The detailed complexity analysis 
in 11.6, the embedding of the HC parsers in the parsing schemata framework 
in 11.7, and the schema for a HC parser for unification grammars in 11.8 
have not been published before. 



11.1 Context-free Head Grammars 

In order to start parsing a constituent from its head, we have to formally 
introduce the notion of a head. For context-free grammars this is done as 
follows. 

Definition 11.1. (heads in context-free grammars) 

A context-free head grammar is a 5-tuple G = (N, 17, P, 5, h), with h a func- 
tion that assigns a natural number to each production in P. 

Let |p| denote the length of the right-hand side of p. Then h is constrained 
to the following values: 

• Hp) — 0 \p\ — O5 

• 1 < h(p) < \p\ for IpI > 0. 

The head of a production p is the h(p)-th symbol of the right-hand side; 
empty productions have head e. □ 

In a much more practical notation for head grammars, we do not define 
the function h explicitly, but simply underline the head of each production. 
The head grammar G for our running example is given by 

S NP VP. 

VP -> 

NP -> *det 

While there is a linguistic motivation for the notion of a head in natural 
language grammars (we come back to this in Section 11.8), this is not the 
case for arbitrary context-free grammars. One could argue that heads are not 
part of the grammar but a function that is attributed to the grammar by 
the designer of the parser. Given a context-free grammar, one could ask the 
question which allocation of heads is optimal for the (worst-case or average- 
case) efficiency of a parser. We will not address such questions here, and 
take the allocation of heads as given. A special case that must be mentioned, 
however, is the following: 
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r(p) = 1 for all nonempty productions p, 

i.e., the head of each production is the left corner. In that case the HC and 
LC chart parser will be identical.^ 



11.2 A predictive Head-Corner chart parser 

A Left-Corner parser proceeds through sentence from left to right; a Head- 
Corner (HC) parser starts with the more important words, leaving the less 
important words to be processed later. How this works in detail is the subject 
of this section. 

For the LC chart parser that was introduced in 10.2 there is no need to 
state that it is predictive. LC parser have that property by definition. The 
bottom-up parsing schema buLC as defined in Chapter 4 is in fact a no- 
tational variant of bottom-up Earley and has been introduced only as an 
auxiliary construct for the derivation of the schema LC. For head-corner 
parsers the inclusion of top-down prediction is not self-evident; it is the com- 
bination of HC chart parsing and top-down prediction that is the innovative 
aspect of the parser presented here. At the same conference where Kay made 
his general statement on head-driven parsing that was quoted in the intro- 
duction to this chapter, Satta and Stock [1989] presented a head-driven chart 
parser that works purely bottom-up. The Head-Corner parser to be presented 
here can roughly be classified as an extension of the Satta and Stock parser 
with top-down prediction as proposed by Kay. 

We introduce the HC chart parser in the same way as the LC chart parser 
in Section 10.2. 

Definition 11.2. {transitive and reflexive head-corner relation) 

The relation on A/' x (F U {e}) is defined by 

A >h U il there is a production p = A->a G P with U the head of p. 
The transitive and reflexive closure of >h is denoted >J^. □ 

For our trivial example grammar, the relation comprises 
S>IS, S>IVP, S>l*v, 

VP>IVP, VP>l*v, NP>INP, NP>l*n. 

^ There are some notational differences, of course, caused by the more general 
nature of the HC parser. Furthermore, there is a tiny difference in implementation 
(pointed out to me by Margriet Verlinden): the HC parsing schemata allow the 
parser to leave gaps in carrying out head-corner steps (even though this does not 
make sense when all heads are leftmost), whereas the LC parsing schemata do 
not allow such gaps. 
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If the relation A>\a holds between a nonterminal A and a terminal a, we 
call a a lexical head of A. For grammar G, lexical heads of a sentence must 
be of the category *v. 

The HC chart parser uses the following kinds of items: 

[Z,r, A] : predict items or goals, 

[l,r, A; j]: head-corner {HC) items, 

[a,j — l,i] : terminal items as in the Earley chart parser. 

The items of the HC chart parser are more complex than the items of the LC 
chart parser, due to the fact that constituents no longer are recognized from 
left to right. Recognition of items should be interpreted as follows. 

• A predict item [l,r,A] is recognized if a constituent A must be looked 
for, located somewhere between I and r. Such a constituent should either 
stretch from I up to some j (if we are working to the right from the head 
of some production) or from r down to some j (if we are working to the 
left from the head of some production), with / < j < r. But, as we start 
parsing A from a lexical head that might be located anywhere between I 
and r, the distinction between these two cases is irrelevant. 

• An HC item [/, r. A; i, j] is recognized if [/, r. A] has been set as a 

goal, A>\B holds, and ...aj has been established. Such an item 

will only be recognized if the head of B~^a0^ is contained in l3. 

In order to get an intuitive idea of what is going on, we will first look at the 
walk through our single parse tree that is performed by the HC chart parser. 
A formal definition is given afterwards. The head-corner tree walk for our 
example is shown in Figure ll.I. It is similar to the left-corner tree walk in 
Figure 10.5. There is only one difference: from a nonterminal we first visit 
(the subtree with as it root) the head daughter. 

By analogy to the LC case, steps down to a nonterminal head are absent. 
No steps down need be taken by the algorithm along paths of heads, as these 
are encoded in the relation >1^. Steps down to non-head nonterminal daugh- 
ters correspond to setting new goals. The final chart of the head-corner parser 
is shown in Figure 11.2. The numbers of the items on the chart correspond 
to the labels of arrows in Figure 10.5. The names of the steps that appear in 
the motivation column should be clear, by analogy to the LC chart parser. 
Note, however, that unlike the LC case, we sometimes proceed in rightward 
direction and sometimes in leftward direction. As a consequence, two different 
cases of scan, complete and predict steps exist. 

Schema 11.3. (pHC) 

We define a parsing system PpHC for an arbitrary context-free head grammar 
G. The domain XpHC is given by 
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= {[l,r,A] I ^ € AT A 0<l<r}, 
jHC(i) ^ {[l,r,A-,B^a.piXP2-l,i,j] M G AT A A>IB A 

B-^a(5iXJ32^ GP a 0 <l < i < j < r}, 

j-HC(tt) _ I A G A A>^^B A 

B-^e e P A 0<l <j <r}, 

IpHC = 

It should be noted that some restrictions are enforced by the the definition 
of the domain. The left-hand side of the recognized part must be a transi- 
tive/reflexive head of the nonterminal in the goal part. Hence this condition 
need not be stated again when we define the deduction steps. 

The set of hypotheses is a formal parameter that can be instantiated for any 
particular sentence. In this case, however, unlike the schema pLC, we need 
to be able to derive the length of the sentence from the set of hypotheses. 
This information is provided by a special end-of-sentence marker. Hence, for 
arbitrary sentences ui . . . Un a set of hypotheses is defined as 

H = {[ai,0, 1], ..., [an,n-l,n], [$,n,n + l]} (11-1) 

The definition of DpHC looks complicated because of the complexity of the 
items and the multitude of different cases. The best way to understand the 
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Fig. 11.2. A completed HC chart (excluding terminal items) 
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definition is to keep in mind that each type of deduction step is a straightfor- 
ward extension of a corresponding type of LC deduction step. We distinguish 
subsets of D for initialize, terminal head-comer, nonterminal head-comer, 
empty head-comer, left predict, right predict, left scan, right scan, left com- 
plete, and right complete deduction steps. The different kinds of head-corner 
steps are abbreviated with the symbols a. A, and e as usual. 

= {[$,n,n-hl] I- [0,n,5]}, 

£)HC(a) _ [b,j-l,j] b [l,r,A;B-^a.b.'r,j-l,j]}, 

D^C(A) _ {[l^r,A;C-^»6.,i,j] b [l,r,A;B^a.C,'r,i,j]}, 

DHC(e) ^ {[l,r,A], b [l,r,A;B-^..,j,j]}, 

jQiPred _ {[i^r,A;B^aC»0»'y,i,j] b 

^rPred _ A; B^a.0»Cn/,i, j] b [j,r,C]}, 

£,iScan _ {[a,j-l,j],[l,r,A;B-^aa.0.y,j,k] 

b [l,r,A;B^a.a0.j,j-l,k]}, 

j^rScan _ A] B^a.0.a')f,i, j], [a, j, j + 1] 

b [l,r,A-,B-A-a.0a.'y,i,j-kl]}, 

jjiCom.pl _ {[l,j,C-,C^.S.,i,j], [l,r,A;B-A-aC.0.j,j,k] 

b [l,r,A\B^a.C0»-f,i,k]}, 

{[i, r. A; B-^a.p.C'y, i,j], [j, r, C;C^.d.,j, k] 

b [l,r, A\ B^a»0C.'y,i,k]}, 

jjinit y ]jHC(a) y JjHC(A) y JjHC(e) y jjlPred y JJrPred^J 
j^lScan u jyrScan y j^lCompl y j^rCompl 

Thus we have fully specified a parsing system Pp//c = {'d^pHC: H, DpHc) for 
an arbitrary context-free head grammar G. Q 

The chart parser based on pHC does not need the additional hypothesis 
[$, n, n -f 1]. The initial chart contains [oi , 0, 1], . . . , [a„, n - 1, n]; the initial 
agenda is set to [0, n, 5] as before. The end-of-sentence marker was included 
in the parsing schema only because D, by definition, is independent of (the 
length of) the string that is to be parsed. The chart is initialized for a par- 
ticular given sentence. 

Head-corner parsing of natural language reduces the ambiguity during 
the construction of a parse. Recognizing the head of a phrase first enables 
a more effective use of feature inheritance for the recognition of other parts 
of a phrase. A disadvantage, in the case of context-free grammars, is the 
increased complexity caused by the non-sequential way in which the sentence 
is processed. Some deduction steps involve 5 position markers, which means 



j^rCompl _ 

DpHC = 




11.3 Correctness of the HC chart parser 



227 



that a straightforward chart parser implementation needs O(n^) steps in the 
worst case. In Section 11.4 we introduce a simplified HC chart parser that 
has the usual O(n^) worst-case complexity. Like in the LC case, we split the 
items in a predicted part and a recognized part. In this case it is not entirely 
trivial, however, that the worst-case complexity is cubic. 

Things are different for parsing unification grammars. The usual context- 
free worst-case complexity analysis is of little value. By keeping the predicted 
and recognized part within a single item, the features structures of both parts 
can share substructures. For “reasonable” unification grammars, this should 
be a much more important factor for the efficiency of the algorithm than the 
risk of a worst-case explosion of items. While it is always possible to blow 
up the efficiency of unification grammar parsers with carefully constructed 
nasty grammars, it is simply assumed that natural language grammars are 
not worst-case. In Section 11.8 we discuss an extension of the HC chart parser 
with feature structures. 



11.3 Correctness of the HC chart parser 

Similar to the correctness proof of the LC chart parser, we will first postulate 
a set of viable items and afterwards prove that all viable items and no others 
are valid. 

Definition 11.4. {(pHC-)viahle items) 

We define pHC-viability (or shortly viability) for each of the types of items 
used. 

• Let i? denote the set of viable predict items. Q is the smallest set satisfying 
the following conditions: 
o [0, n, S] e i?, 

o if [/, r, E 17 and there are jB, C, X, a, /?, 7 , i, j such that 

(0 a>ib, 

(ii) B-^aXJ3C-f G P, 

{iii)Xp=>*ai-^i ’ ’ - cij^ and 
(iv) I < i < j < r 
then [j, r, C] E 17, 

o if [/, r, E 17 and there are P, C, X, a, /?, 7 , z, j such that 
(z) A>IB, 

{ii) B-^aC0Xl € P, 

(zu)/?X=^*ai+i . . . ttj, and 
{iv) I < i < j < r 
then [/,z,C] E 17. 
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• A head-corner item [/, r, A; jB— j] is viable if 

{i) [Z, r, A] is viable, 

{ii) 

{in) I < i < j < r, 

{iv) /3=^*ai^i . . .aj, and 

{v) P contains the head of B-^aPj. 

• A terminal item [a,j - I, j] is viable if a = aj; 

furthermore, [$, n, n-b 1] is viable. □ 

Note that the definition of viable head-corner items covers both and 

j-HC{ii) If 0 — ^ ^ ^ ^ j ^ head of B^aP^. 

Unlike the LC case, there is no straightforward direct definition of viability 
of predict items, due to the non-sequential nature of the HC parser. 

Corollary 11.5. An item [0,n, 5; 5— >’«/3.,0,n] is pHC-viable for a string 
ai ... an if and only if S=>P=>*ai . . . Un- □ 

Lemma 11.6. 

Any item contained in V{]^pHc) is is pHC-viable 
(i.e., the HC chart parser is sound). 

Proof: straightforward, as Lemma 10.6. □ 

Lemma 11.7. 

All pHC- viable items are contained in V{FpHc) 

(i.e., the LC chart parser is complete). 

Proof. We follow the same line as in the proof of Lemma 10.7. We define 
a function w, the tree walk function, that assigns a rank value to all viable 
items. In order to prove (by induction on w) that each viable item is valid, 
it suffices that show that for each viable item ^ is the consequent of some 
deduction step r^i, . . . , h ^ with w{r]i) < w{^) for 1 < z < fc. 

We define the function w for each item such that it corresponds to the min- 
imum length of a head-corner walk through a (partial) parse tree that is 
needed to derive the item. We count all edge traversals, also the dotted lines 
in Figure 11.1. The definition of w makes use of parameters tt. A, and /i that 
encode the w-value of a relevant predict item, the number of edges skipped 
by the > 1 ^ relation, and the length of a derivation of the recognized part. See 
the proof of Lemma 10.7 for a more detailed account. 

The partial function w : XpHC N is defined by 
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• ^t;([0,n,5']) = 0; 

• w([i,j,C]) = min( {tt - f (A H- 1) -h 2/x | 3 A, a, 7 , Z, /i : 

[I, j , A\ ,h,i] is viable 

A 7T = w{[l,j,A]) 

A A>^B 

A 0=^^^ah+i ...a, } 

U {tt + (A + 1) + 2/x I 3A,B,a,0,j,r,k: 

[i,r,A;B-i-aC^0>'y,j,k] is viable 
A 7T = w{[i, r, ^]) 

AA>lB 

A . - . dk } ) 

• tt;([Z,r, A;B->a./3«7, z,j]) = min({7r -f- A 4- 2/i | tt = w{[l,r,A]) A 

A>)jB A 

1 ...CLj } 

• C]) and w{[l, r, A; i,j]) are undefined if the conditions in 

the preceding two cases cannot be satisfied (i.e., the minimum is taken over 
an empty set). 

It is easy to verify that for each viable item there are at least one tt, A, /i for 
which the conditions are fulfilled, hence, (by induction on the definition of 
viability) w is defined for all viable items. 

For each viable item by analogy to the left-corner case, one can straight- 
forwardly find a deduction step r]i,.. . ,rfk F ^ with w{rii) < w(^). We will not 
write out the individual cases. □ 

Theorem 11.8. {correctness of the pHC chart parser) 

For any context-free head grammar G and string ai , . . Un it holds that 

[0, n, 5; 0, n] e V{FpHc) if and only if S^0=>*ai . . . On- 

Proof: directly from Lemmata 11.6 and 11.7 and Corollary 11.5 □ 



11.4 HC chart parsing in cubic time 

The purpose of this section is to derive a variant of the Head- Corner chart 
parser that conforms to the usual worst-case complexity bounds for context- 
free chart parsing. In contrast to the LC case, this is not trivial. 

In two steps we change the schema pHC into a simple schema sHC. We 
also spend a few words on further optimizations. The correctness of sHC will 
be proven in Section 11.5, a detailed complexity analysis follows in 11.6. 

Like in the left-corner case, we can split the HC items into a predicted 
and recognized part. We will call this schema sHC’. Some more modifications 
need to be carried through in order to obtain the desired parsing schema sHC 
that can be implemented in cubic time. 
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We use the following kinds of items 
[l,r,A] : predict items, 

double dotted (DD) items, 

[a,j — 1, j] : terminal items. 

The double dotted items have the obvious interpretation. For each such item 
that is recognized it will hold that ...aj. 

Schema 11.9. (sHC’) 

We define a parsing system IPsHC’ for a.n arbitrary context-free head grammar 
G. The domain XsHC' is given by 

jPred ^ {[l,r,A] \AeN A 0<l<r}, 

j-HC(i) ^ {[B-^a.0iX02-l,i,j] I B-^a0iX02ieP A 0<i<j}, 
j-HC(ii) ^ a j>0}, 

PsHC’ = U U 

In pHC only combinations of predicted and recognized parts were consid- 
ered with a relevant head-corner relation; this was enforced by the definition 
of XpHC- fo XsHC’ the predicted and recognized parts have been separated 
into different items, hence we must explicitly restrict the deduction steps only 
to the appropriate cases. Note that a predicted part is needed for the scan 
and complete steps; it provides a scope within which the position markers 
of the item can be extended by moving the dots outward. In the LC case, 
there is no need to monitor such a scope, because the LC schemata proceeds 
through the sentence in contiguous fashion. 

Thus we obtain the following definition of D\ 

= {[$,n,n + l] h [0,n,5]}, 

DHC(a) = {[l,r,A],[b,j-l,j] 

f- [B->a.b.j,j-l,j] I A>IB A l<j<r}, 

DHC(a) ^ [C-A.0.,i,j] 

h [B-^a»C.0,i,j] I A>\B A l<i<j<r}, 

DHC(e) ^ I A>IB M<j<r}, 

= {[l,r,A], [B^aC.0,j,i,j] 

h [/, i, C\ I A>^B A / < i < i < r}, 

DrPred ^ {[l,r,A], [B^a.0.C'r,i,j] 

I” I A>IB A l<i<j<r}, 
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DiScan ^ {[l,r,A], [a,i-l,j], [B^aa.l3.yJ,k] 

f- [B-^a.a/3*^,j — l,k] | A>^B A l<j<k<r}^ 

j^rScan _ [B-^a.fi.a'y A, j], [a,j,j + l] 

h [.B->a./?a« 7 , z, j + 1] I A>*^B A l<i<j<r), 

DiCor.pl ^ {[i^r,A], [C-^.6.,i,3l [B-^aC.M,hk] 

h [J5-^a.C/?.7, z, A:] | A>"^B A l<i<k<r}^ 

D^Compi ^ {[l,r,A], [B^a.0.Cj,i,j], [C^.S.,j,k] 

f- [B-^arPC.'y,i,k] I A>^B A I <i < k <r}, 

DsHC’ = D^nit^D^C(a)^jDHC(A)^Df^C(e)^jDlPredyjD^Pred^ 

j^lScan y j^rScan y j^lCompl y j^rCompl 



With H for an arbitrary sentence as defined in (11.1) we have fully specified a 
parsing system ^sHC^ = {^sHC’^H.DshC’) for an arbitrary context-free head 
grammar G. □ 

A chart parser is obtained from the parsing schema as usual; The init step 
should be interpreted as initializing the agenda with [0,n, S] for a given sen- 
tence. The end-of-sentence marker is not used by the chart parser. It was 
introduced only to specify the schema independent of a particular sentence 
length. 

The number of items that can be recognized now is O(rz^), but the work in- 
volved for an arbitrary current item is more than linear. Because the complete 
steps have 5 positions markers, they account for O(n^) complexity. We will 
define a schema sHC as a modification of sHC’, in such way that it can 
be implemented with O(n^) complexity. At the same time we include some 
changes that reduce the complexity in terms of the size of the grammar. 
These will be discussed at length in 11.6. 

• By an appropriate change in the definition of D we will reduce the number 
of position markers in complete steps to 3 and increase the positions mark- 
ers involved in a predict step to 5. This leaves O(n^) as the complexity of 
a naive, straightforward implementation of the chart parser. In 11.6, how- 
ever, we will argue that all predict step can be dealt with in O(n^) time 
by adding suitable auxiliary data structures to the implementation. 

We will change the schema, such that the following statement holds: 

if [i, r. A] G V then [z, j. A] E V for arbitrary I < i < j < r. (11*2) 

As a result, we can change the position markers / and r in the complete 
steps to z and j; similar for the scan steps. 

In order to achieve (11.2), however, some more work must be done by the 
init and predict steps. Init now simply recognizes [z,j, 5] for all applicable 
i and j. 
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In the left predict we can replace 

as defined in sHC’ by 

[l,k,A\^ [5->aC./3.7, fc] ^ 

with I < h < i < j and A>]^B. A similar extension of right predict steps 
is made. As a consequence, the validity of [/, A;, A] implies the validity of 
[/i,z, A] for intervals located between I and r.^ Hence we may restrict the 
left complete steps 

[/,r,A], j], [B-^aC./3.jJ,k] h [B-^a.C,6.jJ,k] 

as defined in sHC’ to only the cases 

[z,A:,A], [B-^aC.0.jJ,k] f- [B^a.C0.^,i,k]. 

Right complete steps are restricted in the same fashion. 

In a similar way, we can restrict the position markers in the various HC 
steps. 

• A second change is (a slight modification of) an optimization suggested by 
Satta and Stock [1989]. Suppose the grammar has a production A-^XY_Z. 
Furthermore, let [A-^.XY Z.,h,k] be valid. Then, starting from an item 
[A-^X.y.Z, z, j] there are two ways to recognize the entire production. 
One could start either by moving the left dot leftwards or by moving the 
right dot rightwards. Clearly, if the two mentioned items are valid then 
[A->.XT.Z, /i, j] and [A-)>A«TZ.,z,fc] must be valid as well. 

We will simply discard the second option and state as a general rule that 
expansion to the right is allowed only when the left dot is in leftmost 
position^. 

• A third change that is merely of an administrative nature is the introduc- 
tion of a new kind of items. We use CYK items of the form [A,z,J] to 

^ It can be shown that the same condition holds if for a less liberal expansion of 
the predict rules, that take only 4 position markers. It suffices to add predict 
steps 

[h,k,A], [B-^aC»l30y,j,k] h [A,h,i] 

with h < i < j and A>*^B, because for any predicted [I, k, A] it is clear that 
[h, k, A] with I < h < k can also be predicted; similarly for right predict. But the 
extra degree of freedom has no bearing on the complexity of the algorithm (as 
we will prove in Section 11.6) and might offer better opportunities for efficient 
implementation, because the whole range of predicts can be dealt with in a single 
operation, rather than having to do a series of predicts for each applicable value 
of i. 

^ For grammars with rather long right hand sides (and centrally located heads) 
one could think of more sophisticated criteria. Satta and Stock allow expansion 
in arbitrary direction and then administrate that the other direction is blocked. 
This is a rather academic problem, however; productions with the head neither 
in left nor right position are very hard to find, if at all existent. 
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denote that an arbitrary production with left-hand side A has been recog- 
nized between positions i and j. This extension has some influence on the 
efficiency of the parser, but is also useful to simplify the notation. We may 
write [X,i,j] as a generic notation for a completely recognized constituent 
that is either a terminal ([a,z,j]) or a nonterminal Hence, in the 

notation of the parsing schema, a scan can be seen now as a special case 
of a complete. CYK items [A, are recognized by pre- complete steps of 
the form 

[A->./?.,z, j] h [A,i,j]. 

Thus we obtain the following deflnition for a a parsing schema sHC. 

Schema 11.10. (sHC) 

We deflne a parsing system ^sHC for an arbitrary context-free head grammar 

G, incorporating the changes discussed above. 

= {[l,r,A] I >1 e iV A 0 < Z < r}, 

jHC(i) ^ {[B-^a.jSX.'yXi] I B^aPXj e P A 0 < Z < j}, 

jHCiii) ^ {[B-^.aXf3.-y,i,j] \ B^aX^j e P A 0<i<j}, 

jHCiiii) ^ {[B^..,j,j] I B-4S G P A j > 0}, 

= {[A,i,3\ M € TV A 0 < Z < j}, 

IsHC = u u u u 

= {[$,n,n + l] h [i,j,S] I 0<Z<i<n}, 

= {[i,j,A], [X,Z,j] h [B^a.X.p,i,j] I A>IB}, 

D^c(e) ^ {\j,j,A] h [B-^..,j,j] I A>IB}, 

= {[l,r,A], [B^aC.0.y,k,r] 

f- [i,j,C] I A>IB A l<i<3<k), 

DrPred ^ {[l,r,A], [B^.p.Cj,l,i] 

h [j,k,C] I A>IB A i < j < k <r}, 

j)preCompl ^ [A,i,j]}, 

DiCon.pl ^ {[i,k,A], [X,i,j], [B^aX.p.^,j,k] 

I- [B^a.X/3.'f,i,k] | A>IB}, 

DrCompl ^ [B^.p.X'yJJ], [X,j,k] 

I- [B^.px.jj,k] I A>IB], 

pypreCompl y j^lCompl y jjrCompl 



DsHC 
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Thus we have fully specified a parsing system Fs/fC = {^sHG’> Dshc) for 
an arbitrary context-free head grammar G. □ 



Although we have established the optimal worst-case complexity bounds 
that could reasonably be obtained (cf. Section 11.6), the efficiency in practical 
cases can be increased a lot by adding more sophistication to the simplified 
chart parser, both at schema level by applying some more filters and at 
implementation level by introducing appropriate data structures. We will 
not further pursue the matter of optimizing the chart parser by application 
of filters, but only give some hints. 

• A predicted item should fit to the left, fit to the right, or both. This can be 

expressed by using predict items of the form [= 1^=: r,A], [> = r,A] and 

[= I, < r, A] with the obvious interpretation. When looking for an X such 
that A>^Xj one could distinguish (nonexclusively) between cases where 

o X must occur at the left (i.e., if A=>*aXP then a = e), 
o X need not occur at the left (i.e., A=>*aaX/?), 

and similarly for right alignment. The head-corner operator can use align- 
ment information to discard useless valid items. 

• A dynamic filter that uses one position look-ahead and one position look- 
back may prevent recognition of a number of useless valid items at fairly 
low cost. 



11.5 Correctness of sHC 

We describe the transformation from pHC to sHC in terms of the relations 
of Chapters 5 and 6. For each step, additionally, we will argue that the 
correctness is preserved. 

As in the LC case (cf. Section 10.5), we define an auxiliary system pHC’, 
which is a trivial dynamic filter of pHC, adding spurious antecedents to de- 
duction steps that do not filter anything. To each deduction step in HC(A)^ 
Ipred, rpred, Icompl, and rcompl, we add an antecedent [Z,r, A], reduplicat- 
ing the recognized part of the antecedent HC item. It follows trivially that 

V(Pp//c) = V(Pp//cO- 

The transformation from sHC’ to sHC cannot be directly expressed in 
the available terminology, and we introduce an auxiliary schema sHC” as an 
intermediate step. The different transformation steps from sHC’ to sHC are 
partly filters and partly refinements. We will define sHC” such that it is a 
refinement of sHC’ and a filter can be applied to obtain sHC. The schema 
is defined, as usual, by a parsing system Pshc” for an arbitrary context-free 
head grammar G\ 
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jPred ^ {[Z,r, A] I ^ € iV A 0 < i < r}, 

J//CW ^ {[B^a.l3iXfi2--i,i,3] I e P A 0 < i < j}, 

jHC{ii) ^ I B^e € F A J > 0}, 

jCYK _ l^eAT A 0<i<j}, 

XsHC” = I"""" u U U 

£)/"«« = {[$,n,n+l] f- [i,j,5] I 0<i<j<n], 

= {[l,r,A], [X,i,j] 

f- [B—^a»X»p,i,j] I A I i j ^ ^}, 

DHC(e) ^ {[; ,. ,1] ^ [B^..,j,j] I A>IB A l<j<r}, 

j^iPred _ [B-^aC •p-j, k,r] 

I" [hhC] I A>*^B A l<i<j<k}, 

prPred ^ {[Z,r,A], [B^a.0.Cj , I , i] 

h \j.,k,C] I A>*^B A i < i <k <r), 
jjpreCo^pi ^ {[A^.p.,ij] h [A,i,j]}, 

jyiccmpi ^ {[l,r,A], [X,i,j], [B-^aX.0.-f,j,k] 

h [B->a,X0,^,i,k] I A>*f^B A l<i<k<r}, 

DrCorapl ^ [B-^a.^.X^i ,i, j], [X,j,fc] 

h [B-^a.pX.^,i,k] | A>^B A I <i < k <r}, 

j^preCompl |j jjlCompl jj jQrCompl 

Theorem 11.11. {Correctness of sHC) 

The following relations hold: 

pHC ^ pHC’ ^ sHC’ ^ sHC” ^ sHC. 

Moreover, each of these parsing schemata is correct. 

Proof 

• The correctness of pHC was established in Theorem 11.8, the correctness 
of pHC’ follows from the above argument. 

• The item contraction from pHC’ to sHC’ is similar to the LC case; item 
contraction preserves correctness.^ 



That is, when Corollary 5.8 is extended to the type of parsing systems we deal 
with here; cf. footnote 8 in Chapter 10, page 219. 
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• In schema sHC” we have inserted CYK items and pre-complete steps. 

These constitute a straightforward step refinement. A second step refine- 
ment is the recognition of extra predict items [i,j, A] with I <i < j <r for 
each recognized predict item [/,r, A]. In sHC” these items are spurious, 
however, because we have not discarded any complete step. It is easy to 
show (by induction on the length of the derivation from the hypotheses) 
that if G V(P5//C”) then also G V(Pshc’)- 

The reverse is trivial. Hence, from the correctness of sHC’ it follows that 
sHC” is correct. 

• The transformation from sHC” to sHC consists of two static filters. 

Firstly, trimming down the complete steps and head corner steps to the 
case / = r is a mere redundancy elimination; the set of valid items 

is not affected. Secondly, the Satta and Stock filter removes some of the 
DD items of the form but, evidently, the validity of DD 

items of the form [A-^*0.,i,j] is not affected. Hence sHC is correct as 
well. □ 



11.6 Complexity analysis of sHC 

We will first do a complexity analysis in terms of the sentence length only. 
After having shown that an implementation in O(n^) time is possible, we 
also pay attention to the size of the grammar as a complexity factor. We 
obtain the same worst-case complexity bounds as the GHR algorithm, which 
proves that that additional sophistication of a HC parser does not lead to an 
increase in formal complexity. 

The space complexity is O(n^), obviously, because each type of item con- 
tains two position markers. An upper bound for the time complexity can be 
estimated by assuming that each of the O(n^) possible valid items will trigger 
each applicable type of deduction step. 

All head corner steps contribute O(n^). A (non-empty) head-corner step 
can be triggered in two different ways: either by taking [i, A] or by taking 
[X, 2 , j] from the agenda. 

All complete steps, similarly, contribute a factor O(n^). A complete step 
can be triggered in three different ways: by taking each kind of item from the 
agenda and searching the chart for the two other items. (It is rather unlikely, 
but nevertheless possible, that a predict item taken from the agenda will 
trigger a scan/complete step that produces a hitherto unrecognized item. We 
will not look for optimization in this respect; our prime concern now is cubic 
time complexity). 

The hard case is the set of 0{n^) predict steps. Let us have a closer look 
at left predict steps, having the form 
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[l,r,A], A;,r] h [i,j,C] 

with l<i<j<k<r. We define an invocation of a left predict as a situa- 
tion in which one antecedent is taken from the agenda and a corresponding 
antecedent is found on the chart. An invocation, 

[Z,r,A], [B-^aC.(5.^,k,r] h ... (11.3) 

corresponds to a set of left predict steps for appropriate i and j values of the 

consequent. It is irrelevant whether [/,r, A] comes from the agenda and the 
item [jB->>aC'./3.7, fc,r] is already present on the chart or reversed. Only a 
cubic number of different possibilities exist, hence at most O(n^) invocations 
occur. 

At each invocation, however, there are in general O(n^) different conse- 
quents. Thus a total number of O(n^) times a consequent is computed, looked 
for in chart and agenda, and added if not yet present. As only O(n^) different 
consequents of left predict steps exist, some wastage can be avoided with a 
more sophisticated book-keeping technique. 

We call an invocation of the form (11.3) successful if [l,k,C] is neither 
present on the chart nor on the agenda and unsuccessful if [/, fc, C] is already 
present on the chart or pending on the agenda. In the latter case, every 
[z, j, C] with I < i < j < k must also be present in chart or agenda. 

There are at most 0{n^) unsuccessful invocations, for each combination of po- 
sition markers I, k,r. For each unsuccessful invocation only a constant amount 
of work needs to be done (i.e. verifying that [/, fc, C] has indeed been recog- 
nized already). 

The number of successful invocations, on the other hand, is limited to O(n^), 
because only O(n^) different predict items exist. The amount of work that is 
carried out by an individual successful invocation is possibly quadratic. The 
fact that matters here, however, is that the total amount of work to be done 
by all successful invocations must not be more than cubic. This is established 
as follows. 

We will give an informal example, rather than a formal proof. The predict 
items are stored in a table in the form of an upper triangular matrix, indexed 
by the positions markers (like a CYK matrix). The item [z,^. A] is represented 
by writing an A in table entry Tij. The matrix contains both the chart and 
the agenda (the agenda could be represented, for example, by keeping a linked 
list of matrix entries). Suppose we predict 

[l,r,A],[B-^aC.f3.jJ,r] h ... 



and we have a predict table that already contains some entries for C as shown 
in Figure 11.3. Clearly, one only has to add C’s to all table entries marked *. 
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Fig. 11.3. Table entries to which a C 
must be added 



It is obvious that the total amount of new C’s added in this way is 
quadratic - the table is only quadratic in size. Unfortunately, however, things 
are slightly more complicated. On top of adding C’s to the indicated posi- 
tions, one also has to find out that the other positions left /down from the 
starting point Ti j do contain C’s already. To that end, we check the matrix 
column by column. In each column we may stop when we hit a field contain- 
ing a C. Moreover, if we hit upon a column that contains a C already in the 
first position we are interested in, no further columns need be checked. 

We call an access to a matrix entry a successful access if no C is present 
yet and an unsuccessful access if it contains a C already. The total number of 
successful accesses is clearly quadratic. The total number of unsuccessful ac- 
cesses is estimated as follows; For each successful invocation, a linear number 
of columns is checked, leading to 0(n) unsuccessful accesses (see Figure 11.4). 
Hence the total number of unsuccessful accesses is at most O(n^). 
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Fig. 11.4. Unsuccessful accesses (‘4-’) 




11.6 Complexity analysis of sHC 239 



Thus, in summary, we have in the worst case 

• O(n^) successful accesses by successful invocations 

• 0{n^) unsuccessful accesses by successful invocations 

• 0{n^) unsuccessful invocations; for each one a single unsuccessful access. 



We will now include the size of the grammar in the complexity analysis. 
The size of the grammar can be captured in a single figure, denoted |G|, which 
is obtained by counting every left-hand-side symbol and every right-hand-side 
symbol in every production: 

|G| = 53 (1 + H). (11.4) 

A — 

For a more refined analysis, we use |A^|: the number of nonterminals, \V\: the 
number of terminals and nonterminal symbols, |P|: the number of productions 
and g: the length of the longest right-hand side of any production. For some 
technical computations we also need another, rather ad hoc parameter h: the 
maximum number of productions that have an identical non-empty head. 

In order to determine the space complexity, we list the various tables that 
are used by the parser. 

• The chart and agenda are stored in tables of size 0{\G\n^). 

• It is assumed that the relation >]^ is available in tabular form (if not, this 
has repercussions on the time complexity). This table consumes 0(|A^||F|) 
space. 

• The predict table as discussed above takes 0(|A^|n^) space. 

• We use a table in which we can find all productions for a given head. (Not 
relevant for the space complexity). 

• We also use a dotted rules table which, for a given nonterminal A, yields 

all double dotted rules of the forms and that are 

used in DD items in sHC. (Not relevant for the space complexity). 

Hence we obtain a total space complexity 

0(\N\\V\-i-\G\n^). 

The time complexity for each type of deduction step is determined as 
follows. For every type of antecedent we multiply the maximal number of 
antecedents of that type with the time complexity of searching for applicable 
fellow antecedents and the recognition of (the appropriate set of) consequents. 




240 



11. Head-Corner chart parsing 



head corner: we distinguish three cases for head corners C, a and e. 

(z) for A>\B, B-^agrf G P, 0 < j < n: 

[j-l,j,A], [a,j-l,j] [B-¥a.a.p,j - l,j], 

(n) for A>\B, B->aQy G P, 0 <i < j <n: 

[i,j,A], [C,i,j] I- [B-^a.C.I3,i,j], 

{in) for A>]^B, B-^e G P, 0 < j < n: 

\j,hM I- 

The only case that is relevant for complexity bounds is (u). 

0(|A^|n^) predict items each invoke a search over the |P| productions; 
checking whether a head has been recognized needs constant time, yield- 
ing 0{\N\\P\n^). 

The sub-case where the rule is triggered by a CYK item is somewhat more 
difficult. For each of the 0(|A/'|n^) CYK items at most h productions have to 
be considered, for each of which an 0(|A^|) match with predict items has to 
be attempted, yielding 0{\N\'^hn‘^). 

predict: for A>\B, 0<l<i<j<k<r<n: 

[l,r,A], [B^aC.p.'f,k,r] h [i,j,C], 

[l,r,A], [B-^.p.CjAA] I- b,k,C]; 

We have dealt with the five position markers above. For computing the 
complexity, here, we simply assume that invocations are unsuccessful. The 
work caused by the O(n^) successful accesses in successful invocations is 
counted separately (and contributes only 0(1) in each case). Hence we ob- 
tain 0(|Y|n^) invocations triggered by predict items, causing an 0(|G|n) 
search for applicable DD items. When the predict is triggered by one of the 
0(|G|n^) DD items, an 0{\N\n) search finds the appropriate predict items. 
Hence the complexity is 0{\N\\G\n^). 

pre- complete: for 0 < z < j < n: 

[A-^.a.,i,j] I- [A,i,j] 

scan: for A>l^B^ 0 < * < j ^ 

[a,j-l,j], [B-^aa.0.'r,j,k] h [B->a.al3.'y,j - l,k] 
[i,j + l,A], [B^.l3.a'y,i,j], [a,j,j + l] h [B^.0a.^,i,j + ^ 
Pre-complete and scan are not relevant for the time complexity bounds. 
complete: for 0<i<j<k<n: 

[i,k,A], [C,z,j], [P->aC./3.7, j, A:] f- [P-^a.0^.7,z, /c] 

[i,k,A], [B-^.0.Cj,i,j], [C,j,k] h [B-^.0C.j,i,k] 
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0{\N\m?) predict items trigger 0(\G\n) work; 
o(\G\n^) DD items trigger 0{\N\n) work. 

For the 0{\N\n^) CYK items the accounting is slightly more complicated. 
If a complete is triggered by a CYK item, we first search for relevant DD 
items. The number of different DD items that can match a CYK item will 
differ according to the nonterminal in the CYK item. The relevant DD items 
can be found with the dotted rule table and checked for on the chart and 
agenda in constant time per DD item. Hence, if we count all completes trig- 
gered by CYK items, rather than individual cases, we find a total of 0{\G\n^) 
combinations of CYK and DD items. In each case, an 0{\N\) search for an 
applicable predict item has to be carried out. 

Thus we find a total time complexity of 0{\N\\G\n^) for the complete oper- 
ation. 

In summary, for the head-corner chart parser that implements sHC we 
find a total time complexity 

0(|iV|2/m2 + |Ar||G|n^) 

Theorem 11.12. {complexity o/sHC) 

Let h denote the maximum number of productions having the same head. 
Assuming^ that 0{\N\h) < 0{\G\n) the parsing schema sHC can be imple- 
mented using 

0(|iV||Y|) + 0(|G|n2) space, and 
0{\N\\G\n^) time 

Proof. Direct from the above discussion. □ 

How does this result relate to the complexity of standard parsing algo- 
rithms? The practically optimal complexity bounds® that have been estab- 
lished so far are 0(|G|n^) for parsers without prediction and G(IA'IIGIn^) for 
parsers with prediction [Graham, Harrison, and Ruzzo, 1980] We will briefly 
explain this. 

Consider the improved Earley chart parser defined by Graham, Harrison 
and Ruzzo [1980]. Parsing schemata GHR and buGHR were defined in 
Examples 6.18 and 6.19, respectively, for the predictive and bottom-up vari- 
ant. The GHR parsers can be extended with a pre-complete similar to the 

^ A counterexample to this assumption Eire, e.g, the grammars defined by 
Pk = {5-4 A, \ l<i<k}U {Ai-^B | 1 < i < A:} U {B-^a} 

with 0(|Y|/i) = > k = 0(|G|n). It is clear, though, that the assumption 

holds for any reasonable grammar that has not been specifically designed as a 
counterexample. 

® Less than cubic time complexity bounds have been established by Valiant [1975]. 
This result has only theoretical value, however. The constants involved are so 
large that conventional cubic-time parsing algorithms perform much better than 
Valiant’s algorithm on any realistic parsing problem. 
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one introduced here. Whenever an item [A^7.,z,j] is recognized, we store 
in a separate CYK table the item [A,i,j]. Hence the most complex set of 
deduction steps, the complete steps, are, iox 0 < i < j < k < n: 

[A->a.H/?,z, j], [B,j,k] h [A-^aB.0A,k] 

There are 0(\G\'n?) active items, causing 0(n) work each to search for the 
appropriate [B, j, fc], yielding a total of 0{\G\n^) for complete steps triggered 
by active items. 

There are 0{\N\ti^) passive (CYK) items, causing 0{\G\n) work each to 
search the appropriate [A-^a*Bf3,iJ], yielding a total of 0{\G\\N\n^) for 
complete steps triggered by passive items. This last figure determines the 
complexity of the conventional GHR algorithm with prediction. 

It is possible to reduce the complexity of GHR by a factor |A^|, by making 
sure - in our terminology - that all complete steps are triggered by active 
items. If passive items do not have to look around for matching active items, 
one only has a complexity of 0{\G\n^). In the bottom-up variant, without 
prediction, this is accomplished by appropriate scheduling. When an item of 
the form [A— i, j] is found, searching for some particular [B,j,k] is 
deferred to the moment that all items with positions markers j and k have 
been found. 

In the conventional GHR parser with top-down prediction this type of 
scheduling is not possible. Hence it follows that top-down filtering - which will 
decrease the sequential computation time in ordinary cases - has a negative 
effect on the worst-case complexity. 

Thus we have shown that the parsing schema sHC can be implemented 
with the same worst-case time complexity as the GHR algorithm, which 
has the optimal known worst-case complexity bounds. The GHR complexity 
bounds can be improved by a factor |A^| if the top-down prediction is dis- 
carded. The same applies to Head-Corner parsing. In 11.7 we will define a 
bottom-up HC parsing schema buHC (as an intermediate step in the deriva- 
tion of sHC from dVH). It can be verified straightforwardly that buHC 
can be implemented in 0(|G|n^) time. The only additional complexity fac- 
tor involved in HC parsing might be the 0(lA^||F|)-sized table to store the 
relation >^1. 



11.7 The relation between pHC, sHC, and dVH 

Comparing head-corner parsing schemata with other schemata defined in 
previous sections and chapters meets with the formal problem that context- 
free head grammars are different from context-free grammars. So, in order 
to define relations between context-free parsing schemata and head-corner 
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parsing schemata, one should first extend the context-free parsing schemata 
to (context-free) head grammars. A generic way to do this is the following: 

apply the context-free schema as usual and ignore the head function. 

Thus one obtains proper generalizations in the sense of Chapter 5. Yet this 
does not seem quite right. The problem is that head grammars are not just 
a generalization of context-free grammars; head grammars are a somewhat 
ad-hoc formalism that has been designed with the specific purpose of using 
linguistic head information to guide an otherwise context-free parser. The 
extension of a grammar G to a head grammar G makes sense only if the 
concept head is used in some way or other. 

An equally gratuitous solution is simply to state that 

Every context-free grammar is considered equivalent to a head-gram- 
mar with the head function h limited to the values 0 and 1 (i.e., all 
heads are left corners). 

from this perspective, if follows easily that HC schemata are a generalizations 
of LC schemata. Yet this is not satisfying either. The notion of a head is 
simply nonexistent in context-free grammars and there is no a priori reason 
why heads should be allocated to left corners. If heads are to be used it seems 
more proper to allocate heads in some meaningful way. From that perspective, 
HC schemata are not generalizations of LC schemata. 

The HC schemata can be embedded in the theory of Chapters 4 and 5 if 
we simply add a head function to a context-free grammar (and do not ask 
the question how the heads were allocated). As a starting point we take the 
parsing schema dVHl. In Chapter 6 we have transformed dVHl to buLC 
and then applied a dynamic filter to obtain LC. A basic parsing schema (i.e., 
every tree occurs only in one item) for bottom-up Head-Corner parsing can 
be straightforwardly derived from dVHl. Rather than the static filter of 
De Vreught and Honig, which implies that right-hand sides of productions 
are processed from left to right, we apply the same strategy as in sHC, i.e., 
starting from the head we work right-to-left, until a prefix of the right-hand 
side has been obtained. Subsequently, the remainder is done in left-to-right 
fashion. 

Schema 11.13. (buHC) 

A parsing system ^buHC for an arbitrary context-free head grammar G is 
defined by 

^ I B-^al3X'yeP A 0 < i < i}, 

jHCiii) ^ {[B^.aX0.'j,i,j] I B^aXpjeP A 0 < i < j}, 
jrHCiiii) ^ {[B-^.„jJ] \B-^e€P A j>0}, 

ItuHC 
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= {[$,n,n + l] h [0,n,5]}, 

£)HC(a) _ {[a,j-l,j] h- [B-i-a.a.'yJ 

DHC(A) ^ h [B^a.A,'r,i,j]}, 

DHC(.) ^ 

piScan ^ {[a,j-l,j], [B-^aa.0.'r,j,k] h [S->a.a/3.7, j - 1, fc]}, 
£)rScan _ {[B-^.^.ajA, j], [a, j, j + 1] h [B^.0a.j,i,j + 1]}, 
D^Compi ^ {[A^,6.J,j], [B-^aA,0.'r,j,k] h [B^a.A0.j,i,k]}, 
DvCompi = {[B^.0.A'yXj], [A-^.5.,j,k] h [S->./?A.7,i,fc]}, 

DbuHC = U U U U 

jjLCompl jj jjrCompl 

with H as in (11.1) on page 225. □ 

This schema has been obtained from dVHl by the following transformations: 

• the concatenate has been contracted with an init step (yielding left/right 
scan) and with an include step (yielding left/right complete)] 

• a static filter restricts the init and include steps of de Vreught and Honig 
to heads only (yielding HC{a) and HC{A))\ 

• (our version of) the Satta and Stock filter (cf. page 232) has been applied. 

Hence, dVHl ^ buHC. It is possible to add top-down filtering to 
buHC and define a basic parsing schema HC. From this, similar to the LC 
case, sHC and pHC can be derived by introducing higher-level items. The 
number of different cases in HC is embarrassingly high, however, and we will 
not take the trouble to write out the complete schema. It suffices to remark 
that a predict item [/,r, A] is in fact an abbreviation of the existence of 
either a pair of double dotted items 

[H-> 7 .C(J, /i,/], [J5-^aA*/?,r, fc] with 

or a pair of double dotted items 

[D^7C.(5,r, fc], [B-^a*A(3,h,l] with 

Special items like [5— >“*. 7 , 0, 0] and [5— > 7 *. ,0,0] can be introduced to handle 
initial cases. 
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11.8 HC parsing of unification grammars 

We will describe a predictive HC chart parser for unification grammars. We 
give a schema in the notation of Chapter 8 of a parser that was described by 
Margriet Verlinden [1993]. 

We will first recall some bits of notation that have been formally intro- 
duced in Chapter 8. Here only an informal understanding of feature structures 
is needed. 

A constituent X has a feature structure (p{X) in the usual way. We do 
not make a distinction between feature graphs, constraint sets and attribute- 
value matrices (avms). In chapter 8 we have formalized feature graphs and 
constraint sets and shown that these are isomorphic. We use AVMs as an 
informal notation for both. 

Moreover, we have introduced composite feature structures that cover the 
features of a related set of objects. Consider a production 

S-^NP VP 

{S head) = {VP head) (11*5) 

(VP subject) = (NP) . 

It does not only state that the VP has a subject, but also that that FP’s 
subject is token identical with the NP. We write = for token identity and = 
for type identity. In Figure 11.5 the constraint set in (11.5) is replaced by a 
composite feature structure in AVM notation. In general we write (^o(A->a) 
for the constraints on a production A—^a. We do not have a special notation 
for composite avms, other than listing them together. In a composite feature 
structure, coreferences may occur between different avms for different objects. 



S->NP VP 
S 

NP 



VP 



[cat : S 1 



0 



head: 




[ cat 



NP] 

VP 1 



flGCld • 
_ subject : 



0 

0 



[] 



Fig. 11.5. Constraints (11.5) denoted by a 
composite feature structure 



Other composite objects for which we define composite feature structures 
are items on a chart. In a conventional Earley parser we may obtain an item 
[S-^NP VT.,0,n]. The features of all three constituents in the item, and 




246 11. Head-Corner chart parsing 



coreferences between these feature structures, are covered in a composite fea- 
ture structure denoted (p{[S-^NP VP., 0, n]). If some NP and VP are known, 
the features of [S-^NP VP., 0, n] can be computed by means of the equation 

ip([S-^NP VP., 0, n]) = ipoiS-^NP VP) U ip{NP) U ip{ VP). (11.6) 

It is important to notice that items on a chart are immutable. So, if features 
from some item are going to be used for the computation of features of another 
item, we use type identity (copying of features) rather than token identity 
(sharing of features). If we would have written =, rather than =, in equation 
(11.6), something radically different would have been expressed, i.e., that the 
features of NP and VP themselves are merged into a larger feature structure 
for the final item, rather than copies of the feature structures of NP and 
VP. Features are never shared across different items on the chart. ^ 

The initial chart contains items for the words in the sentence with feature 
structures taken from the lexicon. If the lexicon offers multiple feature struc- 
tures for a single word, then multiple items for that word will be present on 
the initial chart. In a parsing schema it is specified how feature structures are 
to be computed for items that will be added to the chart. For each deduction 
step (in the context-free backbone) the feature structure of the consequent 
can be seen as a function of the feature structures of the antecedents. 

There is an important difference in prediction by an Earley chart parser 
and prediction by an LC or HC chart parser. In the Earley chart parser, 
prediction corresponds to stepwise stepping down along a path in the parse 
tree (cf. Figure 10.3 on page 206). In the LC and HC case, a goal is set and 
then one starts to parse bottom-up towards that goal (cf. Figures 10.5 and 
11.1 on pages 209 and 225). This leads to different approaches and problems 
in prediction of features. Prediction in an Earley chart parser may suffer 
from the defect that ever more complicated feature structures are added to 
the same context-free item. We have extensively discussed this in Section 9.5. 
This problem cannot occur in an LC/HC chart parser, because sequences of 
predict steps in the Earley sense do not occur. 

In context-free HC prediction we use the relation to decide whether a 
recognized constituent can be the transitive head of a goal. If A>^^B there is 
some chain of productions A^^'yBd, but we don’t know which one. The only 
thing we know is that jB is a transitive head of A. It is possible to predict 
some features of B from features of A only under some special conditions. 
Suppose that some feature / is always shared between the left-hand side of a 
production and the head, for each production in the grammar. Then, through 

^ In an efficient implementation, however, some features can be shared under some 
conditions, to minimize the amount of copying that needs to be done. These 
conditions roughly amount to the principle that not a single feature, value, or 
coreference can be added to an item because of any computation with any other 
item. Hence, conceptually, we can see features of separate items a separate, 
immutable structures. See also Section 9.3. 
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any sequence of productions, ^’s feature / will be related to jB’s feature / if 
A>l^B holds, even though A and B may never occur together within a single 
item. The / feature of B will percolate upwards through a chain of successive 
items when we move bottom-up from B towards the goal A. Such a feature 
is called a transitive feature. 

A typical example of a transitive feature is the the agr feature that is 
used for agreement between VP and NP, The constraint that there must 
be subject- verb agreement is laid down in the production S-^NP VP, which 
can be found at the very top of a parse tree. The agreement features of the 
NP, however, are derived from some noun that is the lexical head of the NP. 
Similarly, the agreement features of the VP are derived from the main verb, 
the lexical head of the VP. So, if we have found a VP with agreement third 
person singular, we set as a sub-goal an NP with agreement third person 
singular. Because agreement is a transitive feature, we only need to look at 
third person singular nouns as possible candidates for a lexical head. 

Not only top-level features can be transitive, also sub-features sub-sub- 
features and so on. In the following definition, therefore, we use a feature 
sequence tt, that addresses an arbitrary position in a nested feature structure, 
rather than a feature /. For the time being we assume that a unification 
grammar is obtained by adding features to a context-free head grammar 
(but in the sequel we reverse this and obtain the context-free heads from the 
features in a unification grammar). 

Definition 11.14. {transitive features) 

Let ^ be a unification grammar (cf. Definition 8.29) with a head grammar G 
as context-free backbone. 

A feature (sequence) n is called transitive for a grammar Q if for each produc- 
tion A-^aBry where n occurs as a feature for A in ipo{A—^aBj3) the following 
conditions hold: 

(z) the constraint {Air) = {Bn) occurs in ipo{A-^aBJ3) , 

{ii) for each non-empty production B-^jCS G P, The constraint {Bn) = 
(Cn) occurs in ^o{B-^^C6). 

For empty productions A-^e, a feature of {An) may, but need not be specified. 

□ 

In order to simplify notation, we will assume that all transitive features are 
sub-features of a single top-level feature called head. That is, we require that 

every production A-^aBj G P has a constraint (A head) = {B head). 

This condition can always be fulfilled. If some constituents have no transitive 
features at all, then their head features will be empty. If some features are 
transitive, but not sub-feature of head, then such features will simply not be 
taken into account in the HC prediction. 
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We will now turn this around and define a head unification grammar as 
a unification grammar that obeys certain restrictions. 

Definition 11.15. {head unification grammar) 

A unification grammar^ ^ G is called a head unification grammar if it 
satisfies the following head property: 

For each nonempty production A-^X\ . . .Xk G P there is a unique i 
{I <i < k) such that 

{Ahead) = {Xihead) 

is contained in (po{A-^Xi . . .Xk). 

The right-hand side symbol Xi with i according to the head property is called 
the head of the production A-^Xi . . .Xk. 

For an empty production the head property does not apply and we call e the 
head of the production. 

We write hUQ for the class of head unification grammars. □ 

The head property is not an unreasonable demand on unfication grammars. 
In HPSG, for example, there is a general principle that the syntactic and 
semantic features of a constituent are those of its head. So the restriction from 
UQ to hUQ is not very severe. One can always turn a unification grammar 
into a head unification grammar by adding empty head features (and first 
renaming possibly existing head features that were used for other purposes). 

We will now define a basic feature structure (po([^ A; j]) 

for a head-corner item. Whenever an item of this form is added to the chart, 
it will be decorated with a feature structure that is obtained from this basic 
composite feature structure and features from other items that caused it to 
be recognized. The basic feature structure is defined by 

= <po{B^a0j) U {{Ahead) = {Bhead}}, ^ ’ 

i.e., the basic composite feature structure of an item comprises the basic 
feature structure of the production in the recognized part, augmented with 
head coreference of the left-hand side symbol with the constituent in the 
predicted part. 

After this preparatory work, we can extend the parsing schema pHC 
straightforwardly to the class of grammars hUG- 

Schema 11.16. (pHC(UG)) 

We define a parsing system ^pHC{UG) for an arbitrary grammar G G hUG- A 
set of hypotheses is defined as in (11.1) on page 225, where (like in Chapter 
8) it is assumed that the for an item [a, j — 1, j], the feature ip{a).cat gives a 

® See Definition 8.29 for the class of unification grammars UG. 
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lexical category for the j-th word of the sentence. Multiple items [aj - l,i] 
may be contained in the set H of hypotheses. 

We assume a single hypothesis [$,n,n H- 1] with (p($) == [ ]. 

The domain XpHC is given by 

jPred ^ {[/,r,^] \AeN A 0<i<r A ip(A) 

I AgN A A>\B A 
X 0 ‘}/y E: P A 
0<l<i<j<r A 

j-HC(ir) ^ {[i^r,A;B-^..,j,j]^ \ AeN A A>\B A 

B->e E P A 0<l<j<rA 
<PoiO E viO ^ viO 7^-L }) 

with ipoiO for an item ^ as in (11.7). 

We add identifiers tj, • as subscripts to an item. By writing [I, r, A; B-> 
a.l3iX^2>lA,j]i we indicate that wherever ^ is written elsewhere in the same 
formula, this is an abbreviation for [l,r,A\B-^a,(5iX02'l,hj]- 
The set of deduction steps Dpffc(UG) is defined by adding the specification 
of feature structures of consequents to the deduction steps of Dpnc- In most 
cases this is entirely straightforward. 

£)init _ {[$^7^^714-1] I- [0, n, 5 ]{ I (p(S^).cat = Sa 

(p(S^).head = (po{S).head 
(where <fio{S) = (^o(5-+7)|_5) for some £ P) }, 

l)HC(a) _ [&,j-l,i]c ^ [l,r,A\B-^a.b.'^,j 

I <^(0 = <^o(0 LI v?(A) Ll(^(6^)}, 

l)HC(A) _ ^\i^r,A\C-^.6.,i,j],, h [l,r,A\B^a.C.'y,i,j]i 

I ¥^(0 = ¥^o(^)U<^(C,)}, 

DHC(e) ^ {[l,r,A]r, h [l,r,A-B-^..,j,j]^ 

I ¥^(0 = <Po(^) U(^(^,,)}, 

= {[l,v,A-,B^aC.0,'),i.i]r, b [l,i,C]^ 

I if{C^),cat = (p{Cr^).cat A 
ip{C^),head = (p{Crj).head}, 

= {[l,r,A;B-ya.0.C'yJJ]rj h [j,r,C]^ 

I ^{C^).cat = (f{Crj).cat A 
ip{C^).head = ip{Crj).head}, 



jjrPred 
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piScan ^ {[a,j -l,j]r„[l,r,A;B-^aa.p.'y,j,k]c 

I <^(0 ^V’COUi^Ca,,)}, 



jjrScan ^ {[l,r, A', B^a.p.ajJ, j]„,[a, j, j + 1](, 

h [Z,r, A;B->a.;3a.7,i, j + 1]{ 

I fiO =‘Piv)'-^‘PM}, 



jjiCompi ^ {[l,j,C]C'^.6.,i,j]r„[l,r,A-,B-¥aC,l3.'r,j,k]^ 

I- [l,r,A;B-^a»C0.j,i,k]^ 

I (p{C^).cat = (p(C'^).cat A 

^iO = ^{0 U }, 



QTCom.pl ^ {[l,r,A-,B-^a,0.Ci,i,i\nAhT,C\C'-^.5.,j,k]i^ 

h- [/, r, >1; B-^a./3C.7, i, fc]{ 

I ip{Cc).cat = ip(C^).cat A 

¥>(0 == <p(n) L> ‘P(C^c) }> 



DpHC(UG) 

u 



_ Qinit y QHC(a) y qHC(A) y QHC(e) 

j^lPred y j^rPred y jjlScan y jjrScan y j^lCompl y jjrCompl^ 



In the predict steps it is to be understood that the predicted item has only 
two features: head (possibly with sub-features) and syntactic category. We 
could just have copied the entire feature structure of C into the predicted 
item. But the other features will not be used, so we may leave them out just 
as well. 

In the complete steps we have made a distinction between C and C . In 
an item [1, it is possible, but not necessary to identify the 

predicted C with the the recognized C'.liC >X C, then C could also be a 
descendant of C . Unlike the context-free case C and C are not identical: it 
holds that tp(C).cat = ip{C').cat and ip{C).head = ip{C').head, but C may 
have different features as well, which C has not. 

Thus we have completed the description of a parsing system fpHC( UG) = 

UG ) , H, DpHCi UG)) for an arbitrary grammar Q £hUQ. □ 

Note that, in general, parsing of unification grammars is not guaranteed 
to halt. In chapters 8 and 9 we have assumed that a grammar Q is used 
such that no infinite chain of deductions occurs in bottom-up direction, i.e., 
V(UG(^)(ai . . . a„)) is finite for any ci . . . a„. It is clear that this condition 
suffices to guarantee that V(pHC(UG)(^)(oi . . . o„)) is finite as well. 
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11.9 Related approaches 

The use of head-driven prediction to enhance the efficiency was suggested 
by Kay [1989]. Satta and Stock [1989] described the first head-driven chart 
parser. Their parser is purely bottom-up and does not use prediction. The 
buHC schema as described in 11.7 is closely related to the algorithm of Satta 
and Stock. The main difference is they do not prescribe whether one should 
proceed from the head to the left or to the right. Both cases are allowed; in 
either case the other way is blocked by keeping the appropriate administra- 
tion. The difference is marginal, however, because almost all productions in 
(man-made) natural language grammars are binary (or unary); in these cases 
there is no choice of direction. 

Satta and Stock [1994] discuss Head- Corner chart parsing in a general 
framework for bidirectional parsing and give similar comparable worst-case 
complexity bounds. Further Head-Corner parsing algorithms for context-free 
grammars, based on variants of LR parsing, (see Chapter 12) are introduced 
by Nederhof and Satta [1994] 

The context-free head grammars in Section 11.1 should not be confused 
with Head Grammars as introduced by Pollard [1984]. These can handle dis- 
continuous constituents by means of “head wrapping” . Head Grammars ex- 
tend the class of recognizable languages to mildly context-sensitive languages 
[Joshi et al., 1991]. Van Noord [1991] describes a Prolog implementation of 
a head-corner parser for languages with discontinuous constituents. A Head- 
Corner parser for lexicalized Tree- Adjoining Grammars is given by van Noord 
[1994]. Veenstra [1995] describes a Head-Corner parser for Chomsky’s [1992] 
Minimalist Program. 

Bouma and van Noord [1993] have experimented with various parsing 
strategies for unification grammars and concluded that for important classes 
of grammars it is fruitful to apply parsing strategies that are sensitive to the 
linguistic notion of a head. 

The concept of Head- Corner parsing is no longer restricted to theoretical 
discussions. Head-Corner parsing techniques have been sucessfully employed 
in several natural language processing systems (which employ grammars tar- 
geted to specific semantic domains): the SCHISMA theater information sys- 
tem [op den Akker et al., 1995], the PLINIUS knowledge base of abstracts of 
publications in Chemistry journals [ter Stal 1996], and the OVIS telephone 
information system for public transport in the Netherlands [van Noord, 1996]. 
Van Noord discusses the Prolog implementation of the parser in detail and 
shows that in this specific context the Head-Corner parser is much more ef- 
ficient (roughly an order of magnitude) than rival bottom-up parsers that 
have been considered for deployment in the OVIS system. 
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11.10 Conclusion 

Head-Corner parsing is rather tricky, because of the non-contiguous manner 
in which a sentence is processed. Hence it is not a coincidence that the parsing 
schemata presented here are the most complicated ones in this book. The 
summit in this respect is the Head-Corner parsing schema for unification 
grammars, which reaches the limits of readability. What it offers, on the 
other hand, is a formal specification of feature percolation in a nontrivial 
parsing algorithm. 

This chapter shows the capabilities of the parsing schemata framework 
to get a formal grip on complicated parsers. The correctness proof of the 
HC schemata contains some bits of hand-waving (i.e., referring to the easier 
LC case) but within acceptable limits, even for a more theoretically inclined 
audience. The complexity analysis of sHC is useful, because it shows that the 
increase in administration does not need to affect the worst-case complexity. 

On a more practical level, it is justified to ask whether the additional 
complications of Head-Corner (rather than Left-Corner) parsing are worth 
the trouble. In my Ph.D. Thesis [Sikkel, 1993b] I wrote 

Head- Corner parsing is a nice idea - at a sufficiently abstract level. 

[...] It is not clear whether the gain in efficiency offsets the increase 

in bothersome details. Much depends on the grammar [...] 

Since then, Head-Corner parsers has been successfully used in several unre- 
lated natural language processing systems, and proved to be rather efficient. 
In an environment where robust parsing is called for - the input could be 
incorrect or incomplete, in particular when it is provided by a speech rec- 
ognizer - Left- Corner parsers do not perform well [van Noord, 1996], and 
Head- Corner techniques are clearly superior. 
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Generalized LR parsing has become popular in the second half of the 1980s, 
after the publication of Tomita’s algorithm [Tomita, 1985]. The theoretical 
foundation of this approach is in fact much older and dates back to Lang 
[1974]. 

In the context of this book, LR^ parsers are of interest because they are 
not chart parsers. In previous chapters we have argued that chart parsers fit 
into the parsing schemata framework in a trivial way. LR parsers are of quite 
a different nature, and it is to be expected that they fit into the framework 
in a nontrivial way. 

In this chapter we investigate how parsing schemata for LR parsers can 
be defined. While chart parsers use items run-time to guide the parsing pro- 
cess, LR parsers use similar items compile-time to compute the parsing table 
in which the control functions are laid down. Therefore we will partly “un- 
compile” the LR parsers and visualise how a sentence is processed by adding 
run-time items to the LR parse stack. This allows a comparison between both 
types of parses at item level. It follows easily that the LR(0) parsing schema 
is almost identical to the Earley schema defined in Chapter 4. 

In the next chapter we will used this insight and cross-fertilize a parallel 
Earley parser with Tomita’s algorithm so as to obtain a parallel Tomita 
parser. 

Chapters 12 and 13 are self-contained and can be read as a single, sepa- 
rate paper. In fact we will spend more than half of this chapter introducing 
Tomita’s algorithm. Deterministic LR parsing is part of the basic education 
of any computer scientist, but Generalized LR parsing is much less known 
in that field. Readers who are familiar with the basic traits of LR parsing 
can move straight to Section 12.3 and those who are familiar with Tomita’s 
algorithm may skip 12.3-12.5 as well. 



^ A note on terminology: The notion LR can be used in several more specific or 
more generic senses. LR denotes deterministic LR parsers and GLR generalized 
or nondeterministic LR parsers. When determinism is not at all relevant, we 
write LR rather than the more cumbersome (G)LR. Furthermore, LR parsers 
can be divided into SLR, LALR and (canonical) LR parsers. We use LR in the 
wider sense, unless explicitly stated otherwise. 
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After some preliminaries in Section 12.1, LR parsing is informally in- 
troduced in 12.2. The basic idea of Generalized LR parsing is stated in 12.3. 
Tomita’s algorithm, treated in 12.4, is obtained by adding a graph-structured 
stack as an efficient data structure to cope with the nondeterminism of the 
LR parser. A formal definition is given in 12.5; this serves as a reference for 
the formal definition of our Parallel Bottom-up Tomita parser in the next 
chapter. Some pros and cons of Tomita’s algorithm are discussed in 12.6. 

In 12.7 we will partly uncompile the algorithm and introduce the “Anno- 
tated Tomita” variant that shows items also at run-time. Parsing schemata 
for LR(0)-based and SLR(l)-based Tomita parsers are given in 12.8. We will 
prove the correctness of the SLR(l) schema. Some conclusions follow in 12.9. 

The presentation of Tomita’s algorithm is based on Tomita [1985] (the 
formal definition in Section 12.5 is after Lankhorst [1991]). The comparison 
of Tomita’s algorithm with Earley’s algorithm is based on [Sikkel, 1991]. The 
presentation of this comparison has been simplified a lot, however, by making 
use of parsing schemata. 



12.1 Preliminaries 

A more extensive definition of context-free grammars has been given in Sec- 
tion 3.1 Here we briefly summarize the notational conventions and recall some 
standard notions of parsing theory that are needed for LR parsing. 

Let G = (AT, E, P, S) be a context-free grammar. We write V for TV U 17. 
A grammar G is called reduced if every symbol can occur in a parse, i.e. 

(0 VXgV 3a,/3G V*: S=^*aX/3, 

(ii) VXeVBxeE* : X=^*x. 

The only use of non-reduced grammars is to serve as counterexamples to 
theorems. In this chapter we have to exclude them explicitly, to be formally 
correct, because constituents X that obey (ii) but not (z) will never be rec- 
ognized by an LR parser. 

For each grammar G = (TV, E, P, S) we define an augmented grammar 
G' = (TV',i:',P',5') by 

TV' = TVU{5'}, 

r' = i:u{$}, 

P' = PU{5'-^5$}, 

with 5' and $ symbols not occurring in V. We write V' for TV' U 17'. 

The following notational conventions will be applied consistently throughout 
this chapter. We write 
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A, B,C^. . , for variables ranging over N' , 

X, y, . . . for variables ranging over V', 
a,b,. . , for variables ranging over Z", 
x,z,. . . for variables ranging over Z'*, 
a, /?, 7, . . . for variables ranging over F'*. 
e for the empty string. 

We write a=>/3 if there are 71,72 such that a = 71A72, /3 = 71^72 and 
A^6 e P'. 

We write a^rmP if there are 7,x such that a = jAx, (3 = ^5x and A-^6 G 
P'. 

A string 7 is called a sentential form if 5=>*7- 
A string 7 is called a rightmost sentential form if 5=^* ^7. 

A derivation S=>rm • • • =>rml is called a rightmost derivation of 7. 

The functions First and Follow are redefined for augmented grammars 
(but to the same effect as First and Follow Definition 6.10). 

The function Follow : N-^p{E') defines the terminal symbols that can 
follow a given nonterminal in a sentential form, i.e., 

Follow(A) = {a I : S'=^*aAa^]. 

The function First : F*^->p(Z') is defined as follows. If a=>*a0 then a G 
FiRST(a). Furthermore, if a^*e then any terminal that can follow a in a 
sentential form is also contained in FiRST(a). Formally, 

FiRST(a) = {a I 3/3, 7,(5 : S*^*Pa'y A a7=>*a(5}. 

We will use First also with a set of strings as parameter. It should be obvious 
that 



FiRST({ai,...,ajb}) = FiRST(ai)U...UFiRST(afc). 



12.2 LR parsing 

A brief, informal introduction to (deterministic) LR parsing is given in this 
section. We refer to the abundant literature for a more comprehensive treat- 
ment. 

The theory of LR parsing has been covered by many authors. LR parsing 
was introduced by Knuth [1965]. More efficient variants, viz. SLR and LALR 
parsing, were defined by DeRemer [1969, 1971]. But LR parsing became a 
useful technique for compiler construction only after automatic generation 
of parsing tables became feasible. This was first described by Lalonde et al. 
[1971]. A well-known LALR(l) compiler- compiler is Yacc [Johnson 1975]. 

More treatments of LR parsing theory are given by Aho and Ullman [1972, 
1977], Harrison [1978], Aho, Sethi and Ullman [1986], Grune and Jacobs 
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[1990], and Leermakers [1993]. We follow Aho and Ullman in the sense that 
states of a parser are introduced as sets of LR-items. Sippu and Soisalon- 
Soininen [1990] follow a more theoretical line and define states of a parser 
as equivalence classes of viable prefixes. An extensive bibliography on LR 
parsing is given by Nijholt [1983]. 

An LR parser is a deterministic push-down automaton. It uses a single 
data structure, a stack containing states. The top element of the stack is the 
state the parser is in. The parser proceeds through the sentence by two types 
of actions: 

• shift: a word is read from the sentence and a new state is pushed onto the 
stack; 

• reduce: a sequence of states is popped from the stack and a new state is 
pushed onto the stack. 

There are two additional actions that stop the parser: an error will occur if 
the string being parsed is not a valid sentence; an accept action acknowledges 
the fact that a valid sentence has been scanned. 

The next action is determined by the state and a prefix of the remainder 
of the input. LR parsers differ according to how many words are used to 
determine the next action. Usually a single word look-ahead is used. 

In down-to-earth examples of LR parsers the general idea of a push-down 
automaton is slightly modified. For illustrative purposes, the states on the 
stack are interlaced with grammar symbols. These grammar symbols repre- 
sent parts of the parse that have been determined so far. In the remainder of 
this chapter we will only use this more legible form of LR parsers. 

As an example grammar in this section we use the following grammar 
Gs (that is specifically designed to highlight some interesting aspects of LR 
parsing): 

(1) S --y NP VP 

(2) 5 -> 5 PP 

(3) NP -> *n 

(4) NP -> *det *n 

The action function is coded into a parsing table. The parsing table for 
G3 is shown in Figure 12.1. The action table is a matrix in which the next 
action can be found for every (top of stack) state and lexical category. The 
end-of-sentence marker $ is taken to be the next lexical category when the 
entire sentence has been scanned. The goto table is used to determine the 
next state in case of a reduce and will be explained by an example shortly. 
The table also contains a column labelled LR(0) items that we will ignore 
for the time being. 

The shift actions are denoted by “5/i fc” with k a state number. Re- 
duce actions are denoted by “re p” with p (the number of) a produc- 
tion of the grammar. Empty entries in the action table denote errors, 



(5) VP 

(6) VP *v NP 

(7) PP *prep NP. 
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Fig. 12.1. A parsing table for Gs 



the accept action is abbreviated acc. We will parse our canonical exam- 
ple sentence the cat catches a mouse represented by the lexical categories 
*det *n *v *det *n. We show the working of the parser by a sequence of 
configurations that represent the entire stack and the remainder of the input. 
The top of the stack is at the right, next to the remaining input. We start 
with only the initial state 0 as the stack contents. 

0 *det *n *v *det *n $. (12.1) 

In the action table for state 0 and category *det we find “5/zl”. The *det is 
shifted and the next state is 1: 

0~-*det~l *n *v *det *n $. (12.2) 

In the action table we find “5/18” at table entry action[l^ *n\. Hence the next 
configuration is 

{)—*det—l—*n—Z *v *det *n $. (12.3) 

The next action (for 3 and *v) is “re4”. This causes the following steps. 
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• The topmost two states and grammar symbols *det — 1 — *n — 3 are deleted 
from the stack. These represent the right-hand side of production 4. 

• The next state is determined by the top of the truncated stack and the 
left-hand side symbol of production 4. In the goto table we find that state 
0 and nonterminal NP yield state 4. 

• The left-hand side symbol and new state are pushed onto the stack. 

This reduction yields the new configuration 

0— iVP— 4 *det *n $. (12.4) 

Next, we find action[4,*v] = sh5, yielding 

0—NP—A—*v—5 *det *n $. (12.5) 

Note that in state 5 it does depend on the next word which action is to be 
taken. If it is *prep or $, then the verb phrase comprises only a which 
should be reduced now. If a *det or *n follows, on the other hand, the verb 
phrase contains an object, which should be shifted first. In this case we find 
action[b,*det] = shl. Proceeding in similar fashion, we get a sequence of 



configurations 

0—NP—4—*v—5—*det—l $ ; (12.6) 

0—NP—4--*v—b—*det--l--*n--3 $ ; (12.7) 

0—NP—i—*v—b—NP—6 $ ; (12.8) 

O—NP—A—VP—7 $ ; (12.9) 

0— S— 8 $. (12.10) 



Finally we find action[S^$] = accept, i.e., the sentence was indeed correct. 

So far we have recognized the sentence but not yet constructed a parse. 
This is done as follows. Prom each configuration we can derive a rightmost 
sentential form (if the sentence was accepted) by concatenating the stack and 
the remaining input and deleting the states and the end-of-sentence marker: 



*det *n *v *det *n, 


(12.11) 


NP *v *det *n, 


(12.12) 


NP *v NP, 


(12.13) 


NP VP, 


(12.14) 


S. 


(12.15) 



Shifts are ineffective to the sentential form, reductions produce a new one. 
The rightmost sentential forms (12.11)-(12.15) comprise a rightmost deriva- 
tion in reversed order. Hence all that has to be done to uniquely encode the 
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function closure (I: set of items): set of items; 

begin 

items := /; 

while there is an item G items 

and a production B—^0 G P such that B—^m/3 ^ items 
do items := items U {J5— >^*/?} od; 
closure := items; 

end; 



Fig, 12.2. The closure of a set of LR(0) items 



parse tree is to output a sequence of reductions (and output whether the 
parser was stopped by accept or error). 

The parser is called an LR parser because it proceeds from Left to right, 
constructs a Rightmost derivation. There are various types of LR parsers 
that we will not discuss here. The current one is called an SLR(l) parser; It 
is a simple LR parser and uses one symbol look-ahead. 

In the example, we identified states by a number. This is only for easy 
reference. A state in fact constitutes a set of so-called LR(0) items, cf. Fig- 
ure 12.1. An item is an object of the form A->a»/3 with A-^a/3 a produc- 
tion. Unlike Earley items, the LR(0) items do not contain position markers. 
Whenever a state s occurs on top of the stack and A^a*f3 G s, the parser 
has recognized a somewhere in the sentence. The positions delineating a can 
be derived from the composition of the stack. In Section 12.7 we will do so 
explicitly. 

LR(0) items with a dot in rightmost position are called final items; 

those of the form A— >./3 with a dot in leftmost position are called initial items. 

The initial state 0 contains the item i.e., we have to start rec- 

ognizing the entire sentence. There are two ways to recognize a sentence: by 
S—^NP VP or by S-^S PP. For either rewrite rule we add an initial item 
S-^»NP VP and 5— >.S PP, respectively. Similarly, there are two rewrite 
rules for NP and we add NP-^. *n and NP-^. *det*n to state 0. In this way 
we have computed the closure of 5' ->>.5$. An algorithmic definition of closure 
is given in Figure 12.2. 

A sentence could start with *det, as in our example sentence. For that 
case, the action table must contain a shift for state 0 and *det. The new 
state (labelled 1) is obtained by moving the dot over *det. We take the 
closure again, but no initial items are added because the symbol following 
the dot is a terminal. Similarly, a shift and a new state is defined for the case 
that a sentence starts with *n. 

In state 1 only a single action is possible. One has just shifted a *det and 
this must be followed by shifting a *n. This leads to state 3, {NP-> *det *n.}, 
containing a single final item. The only feasible action is re4. Note that this 
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function nextstate{I: set of items, X: symbol): set of items', 
begin 

next-state := closure{{A—>aX *13 \ A-¥a*XP £ /}) 

end; 

function alLstates: set of sets of items 
begin 

C := {c/o5iire({S''-^«5$})}; 

while there is an item set I £ C and a symbol X £V 

such that next-state{I , X) ^ 0 and next.state{I , X) ^ C 
do C := C U {next.state{I , X)} od; 
alLstates := C 

end; 



Fig. 12.3. Computation of the set of states 



is entered into the action table only for symbols in FOLLOw(iVP). If, e.g., 
another *n were to follow the input is not a correct sentence and the parser 
could stop right away. 

If *det *n is reduced to NP, the symbols and state numbers *det — 1 — 
*n — 3 are replaced by NP and a new state number. This new state number 
should be found in the goto table. Hence, goto[{),NP] yields a new state, 
labelled 4: 

closure{{S-^NP.VP])^{S-^NP.VP, VP-^.*v, VP^.*vNP}. 

Shifting a *v moves us to state 5. Both rewrites of VP can start with a *v, 
hence state 5 comprises 

closured VP-^ *v. NP}). 

The remainder of the table is computed in similar fashion. Worth mentioning 
is state 8, which contains an item 5' -^5.$. An entire sentence has been 
recognized, hence action[8,$] = accept. It is conceivable, however, that the 
input string has not been processed completely. A prepositional phrase may 
follow, hence action[8,PP] yields a shift. 

An algorithmic definition of the set of states and the parsing table is given 
in Figures 12.3 and 12.4. 



12.3 Generalized vs. deterministic LR parsing 

An LR parser, in order to be deterministic, may only have a single action in 
each table entry. If an entry contains more than one action there is a conflict 
and the parser doesn’t know what to do. A grammar is called SLR(l) if the 
parsing table for that grammar does not contain any conflict. A language 
is called SLR(l) if it can be described by an SLR(l) grammar. The class of 
SLR(l) grammars is a severely restricted subset of the the class of context-free 
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procedure construct SLR(l) table 
begin 

C := alLstates] 
for each I E C 

do for each a £ U' do action[I^a] := 0 od; 
for each item E I 
do case item of 
A — yckma/3i 

if A — ^Oi»af3 = s ' — 



7,$] := action[I,$]U {accept} 

7, a] := 

7, a] U {shift nexLstate{I , a)} 



then action 
else action 
action 

fi 

A^a*B/3: 

goto[I,B] := nextstate{I ^ B) 

A-^a»: 

for each a E Follow (A) 
do action[I,a 
action[I, a 

od esac od; 
for each a E 77' do 

if action[I,a] = 0 then action[I,a] := {error} £i 

od od 



U {reduce A—>a} 



end; 



Fig. 12.4. The computation of an SLR(l) parsing table 



grammars. A necessary (but not sufficient) condition is that the grammars is 
not ambiguous. While most programming languages can be described by LR 
grammars, this clearly does not hold for natural language grammars. 

With some more sophistication, however, LR parsing techniques can be 
used for natural language grammars. The central idea is to replace the word 
conflict by ambiguity. Thus we obtain a nondeterministic pushdown automa- 
ton that is known as a Generalized LR (GLR) parser. If the state of the parser 
and the look-ahead allow for different actions, a nondeterministic choice is 
made. A sentence is correct if and only if there is some run of the nondeter- 
ministic LR parser that accepts its. More specifically, the set of parse trees 
of a sentence is characterized by (the rightmost derivations produced by) all 
successful runs of the parser. 

Nondeterministic automata are useful constructs only from a theoretical 
perspective. If we are to find all parse trees for a given sentence, we need 
some practical way to determine all successful runs of the nondeterministic 
machine. A general approach to handle nondeterministic push-down trans- 
ducers dates back to an early paper of Lang [1974]. But it has remained rather 
unknown until the mid-eighties, when Tomita [1985] published his General- 
ized LR algorithm, written for an audience of computational linguists rather 
than theoretical computer scientists. A similar algorithm was independently 
discovered by van der Steen [1987]. 
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In Section 12.4 we will give an informal introduction to Tomita’s algo- 
rithm. A formal definition is presented in 12.5. 



12.4 Tomita’s algorithm 

For an exposition of Tomita’s algorithm we use the canonical example gram- 
mar G 4 (which is obtained by adding a production *n to grammar G 2 
that was used in previous examples), defined by the productions 



(1) 


S ^ NP VP 


(5) 


NP-^NP PP 


(2) 


S^S PP 


(6) 


PP^ *prep NP 


(3) 


NP^*n 


(7) 


VP-^*v NP. 


(4) 


NP-¥*det *n 







The canonical example sentence is “I saw a man with a telescope”, represented 
by the lexical categories 

*n *v *det *n *prep *det *n. (12.16) 

Both parses are represented in Figure 12.5, in a structure that is called a 
shared forest; “forest” because it comprises a set of trees, “shared” because 
identical subtrees are represented only once. 



S S 




I saw a man with a telescope 



Fig. 12.5. A shared 
forest 



The parsing table is shown in Figure 12.6. Ambiguities arise in states 
11 and 12. With look-ahead *prep, both a shift and a reduce are possible, 
depending on where the PP is to be attached. 

A first, naive approach to nondeterministic LR parsing is the following. 
Whenever an ambiguity arises, a different copy of the stack is made for each 
possible action. Thus we get a set of stacks that is managed in parallel. If 
some stack brings the parser in a state where no action is possible, this stack 
is discarded. Hence, the set of stacks that remains when the entire sentence 
has been processed yields the set of parse trees for the sentence. 
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Fig. 12.6. A parsing table for Ga 



The various stacks are synchronized on shift actions. That is, all possible 
reductions are carried out until each stack is to do a shift. In Figure 12.7 the 
set of stacks is shown that is obtained after parsing a (prefix of a) string 

*71 *v *det *n *prep *det *n *prep *det *n *prep . 

The topmost 5 stacks are identical, but correspond to the 5 different parses 
for a sentence ending with two PPs. This is clearly an inefficient way of 
working. If two stacks have the same top state, they will behave identical 
upto the moment that this state is removed by a reduction. Hence identical 
top parts of stacks can be merged. Thus we obtain a tree-structured stack, 
shown in Figure 12.8. In this case there is only a single top state, in general 
there may be several tops states. 
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Fig. 12.7. Maintaining a set of stacks 



A second optimization is possible. We could also share bottom parts of 
the stacks, when a copy of a stack has to be made. Thus we obtain the 
graph- structured stack as shown in Figure 12.9. Note that each single stack 
in Figure 12.7 corresponds to a path in the tree in Figure 12.8 and to a path 
in the graph in Figure 12.9. All three figures contain the same information. 

In order to formally define a generalized LR parser with a graph-structured 
stack, one has to keep in mind that the graph is in fact a compact represen- 
tation of a set of stacks defined by the paths in the graph. Each stack is 
operated by its own nondeterministic LR parser; all parsers synchronize on 
shifts. Hence it is is clear how to derive a definition of a GLR parser from 
the definition of a deterministic LR parser. The result is rather complicated, 
however, and we do not take the trouble to write it down. In Section 12.5 a 
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formal definition is given of an optimized version of the GLR parser that will 
be discussed next. 

So far we have only considered recognition of a sentence by a GLR parser. 
In order to yield a forest of parse trees, we have to keep some additional 
administration. We will maintain a parse list of nodes that occur in a the 
parse forest with pointers to their daughter nodes. To that end, the algorithm 
is modified as follows. 

• Upon a shift, the terminal that is shifted is added to the parse list. The 
symbol vertex is labelled with the index in the parse list, rather than with 
the symbol itself. 

• Similarly, upon a reduce, an entry into the parse list is made for the left- 
hand side symbol of the reduced production. A list of pointers to its daugh- 
ter nodes (the just removed indices of right-hand side symbols) is contained 
in the parse list entry. 

Figure 12.10 shows the parse list corresponding to the shared forest of “I saw 
a man with a telescope”. 
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Fig. 12.10. List representa- 
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tion of a shared forest 



When a sentence has n parse trees, then the shared forest will have n 
root nodes. The shared forest of the 5 parse trees of the sentence “I saw a 
man in the park with a telescope” is shown in Figure 12.11. But, just as we 
share bottom parts of parse trees, we could also share top parts of parse 
trees. If a nonterminal symbol rewrites to the same part of the sentence in 
different ways, it needs to be represented only once. The different nodes in the 
shared forest are grouped into a single so-called packed node that comprises 
several sub-nodes. This is illustrated in Figure 12.12, where packed nodes 
are represented by rectangles and sub-nodes by symbols contained in the 
rectangle. The graph structure that is obtained in this way is called a packed 
shared forest. 

The shared forest (represented by a parse list) in Figure 12.10 had two 
root nodes. In order to obtain a packed shared forest, the two nodes 

15 [5(114)] 

16 [5 (7 12)] 
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have to be replaced by a single node 
15 [S (1 14) (7 12)]. 

We need to adapt the algorithm, so as to make sure that the a packed node 
in the packed shared forest corresponds to a symbol node in the graph- 
structured stack. 

• Whenever a state vertex is preceded by several symbol vertices that refer 
to (different entries of) the same grammar symbol, these symbol vertices 
are merged into a single vertex. The corresponding entries in the parse list 
are merged into a single entry, representing a packed node in the packed 
shared forest. 

This is illustrated in Figure 12.13. 

In a graph-theoretically more elegant description, a packed shared forest 
should be defined as a bipartite directed graph: a graph with two distinct 
types of nodes and edges only between nodes of different types. To that 
end, assume that every node is a packed node. “Ordinary” nodes, then, are 
packed nodes with only a single sub-node. Moreover, consider packed nodes 
and sub-nodes as separate nodes; a packed node has edges to each of its 
sub-nodes. A sub-node has edges to its packed successor nodes. Such an 
approach is taken by Rekers [1992], who uses symbol nodes for the packed 
nodes and rule nodes , labelled with the applicable rewrite rule, for the sub- 
nodes. Based on this bipartite graph structure, Rekers optimizes the packing 
of the forest and extends his GLR parser to the class of reduced context-free 
grammars (Tomita’s algorithm cannot handle certain kinds of grammars, cf. 
Section 12.6). 

For the current exposition, we will follow the informal approach of Tomita. 

As an example, we will look at a few interesting situations that occur 
while parsing “I saw a man in the park with a telescope.” Each figure contains 

• the graph-structured stack; 

• at each top of the stack, the next action (s) that have to be performed; 

• the parse list representation of the packed shared forest. 

We have labelled the parse list with letters, rather than numbers, because 
numbers are used in the graph structured stack to indicate states. 

The first ambiguity occurs when “I saw a man” has been processed, cf. Fig- 
ure 12.14. In the parsing table in Figure 12.6 we find action[l2,*prep] = 
{sh6y re7}. Hence, while we await the shift on one branch of the stack, re- 
ductions of VP-^*v NP and S—^NP VP are carried out on another branch, 
cf. Figure 12.15. 

Both tops of the stack are to shift to state 6 now, and the branches can 
be merged. After shifting *prep, *det^ and *n, and reducing NP-^*det *n 
the situation in Figure 12.16 is obtained. 
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As we carry out re6, we have to add a PP to the state vertices that are 4 
positions down from the top of the stack. We find two different state vertices 
(labelled 12 and 1), and both must be extended with a PP symbol vertex. 
The result of this reduction is shown in Figure 12.17. Note that goto[12,PP] 
= 9 and goto[l,PP] = 5, hence the two new branches of the stacks cannot be 
merged. But, as both branches contain the same PP “in the park”, the two 
symbol vertices are labelled with the same entry in the parse list. 

After all further reductions are carried out, and two S vertices covering “I 
saw a man in the park” are merged into a single vertex, we get the situation 
that is shown in Figure 12.18. 

Parsing continues in similar fashion with the next PP “with a telescope” . 
After the last word has been shifted, branches of the stack synchronize on 
accept, rather than shift. The final situation is shown in Figure 12.19. 
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(a) [*n“l”] (g) [VP {c, f)] (m) [PP 
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Fig. 12.17. “I saw a man in the park . . .” 
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Fig. 12.19. “I saw a man in the park with a telescope.” 
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12.5 A formal definition of Tomita’s algorithm 

We give a formal definition of Tomita’s algorithm in the style of [Tomita, 
1985]. The reason for writing out this definition is that it is a starting point 
for the formal definition of our PBT algorithm in Chapter 13. 

A minor error in Tomita’s algorithm has been repaired. A set of top 
nodes, rather than a single top node is returned. Different nodes for a single 
constituent cannot be shared when these lead to different states of the parser. 
This may also apply to roots of the parse tree. This enhancement is due to 
Lankhorst [1991], who also gives the following example. Take the following 
grammar 

A. — Yd 
A-^e. 

The resulting parse list for a string a is: 

1 [A] 4 [a] 7 [5 (1,5)] 

2 [A] 5 (4)] 8 [A] 

3 [5 (1,2)] 6 [A (4)] 9 [5 (6,8)]. 

The result delivered by Tomita’s original algorithm is node 9 as a root of the 
parse forest, being the last node found by an accept action. Node 7 is also a 
root, however, and therefore should also be delivered as result. 

In the description of the algorithm the arrows are directed from right to 
left (in the illustrations in the previous section). A top of the stack is a source 
of the graph, the bottom of the stack is the sink. This is counterintuitive, 
perhaps, but has some advantages for implementation. 

In the formal description we use the following functions and global vari- 
ables: 

r : graph-structured stack. This is a directed, acyclic graph with a single 

leaf node, uq, labelled with state number sq. T is initialized in parse 
and altered in reducer, e-reducer, and shifter. 

T : shared packed forest. This is a directed graph (F, E) in which each vertex 

V £ V may have more than one successor list (u, L) G E. Initialized in 
PARSE and altered in REDUCER, e-reducer, and shifter. 
r: the result. This is a set of vertices of T which refer to the roots of the 

parse forest. Initialized in parse and altered in ACTOR. 

Uij: set of vertices of F; Ui^o is created created when parsing a^, Uij with 
j > 0 when parsing the j-th e after ai. 

A: subset of “active” vertices of Uij on which reductions and shift actions 

can be carried out. A is initialized in PARSEWORD and altered in ACTOR 
and E-REDUCER. 
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R: set of edges to be reduced. Each element is a triple (v,x^p) with v E Uij^ 

X e succESSORS(u) and p a non-empty production of G. (v,x,p) G R 
means that reduce p is to be applied on the path starting with the edge 
from V to X. REDUCER will take care of it. R is initialized in parseword 
and altered in actor and reducer. 

Re', set of vertices on which an e-reduction is to be carried out. Each element 
is a pair {v^p) with v E Ui^j and p and e-production. {v,p) E Re means 
that reduce p is to be applied on the vertex v. E-REDUCER will carry 
out this reduction. Re is initialized in parseword and altered in actor 
and E-REDUCER. 

Q: set of vertices to be shifted on. {v,s) E Q means that shift s is to 

be carried out on v. SHIFTER will take care of this. Q is initialized in 
PARSEWORD and altered in ACTOR and SHIFTER. 

left(p): left-hand side of production p. 

\p\: length of the right-hand side of production p. 

STATE (?;): takes a vertex in F as its argument and returns the state label of 
this vertex. 

SYMBOL(a;): takes a vertex in F as its argument and returns the symbol label 
of this vertex. This label is a link to a vertex in T. 

SUCCESSORS(v): takes a vertex in F as its argument and returns the set of 
all vertices x in F such that there is an edge from v to x. 

GOTo(s,^): looks up the goto table and returns a state number. 5 is a state 
number and ^ is a grammar symbol. 

ACTION (5): looks up the action table and returns a set of actions, s is a state 
number. 

addsubnode(u, L): takes a vertex p in T and a successor list L as arguments 
and adds {v, L) to E in T = {V, E), 

The parser is defined by the following set of procedures 

procedure parse(G, ai . . . an) 

begin 

F := 0;T:= 0; r := 0; 

create in T a vertex vo labelled sq; 

Uo,o ’= 

for i 0 to n do PARSEWORD(i) od; 

return r, the set of roots of the parse forest 

end PARSE; 
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procedure parseword(z) 
begin 

j := 0; A := Ui^o] 

Q:=ii;R := 0; R, := 0; 

repeat 

if A 0 then ACTOR 
elseif i? ^ 0 then REDUCER 
elseif Re then E-REDUCER 
£i 

until A = 0 and i? = 0 and Re = 0; 

SHIFTER 

end PARSEWORD; 

procedure ACTOR 
begin 

remove one element v from A; 
for all a G action(state(v)) 
do if a = accept 

then r := r U {z;}; 
elseif a == shift 

then Q QVJ {(?;, GOTO(state(z;), ai+i))} 
elseif a = reduce p and p is not an e-production 
then for all x G SUCCESSORS(u) 
do R RU {{v,x,p}} od 
elseif a = reduce p and p is an e-production 
then Re := Re U {(z^,p)} 
fi 
od 

end ACTOR; 
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procedure REDUCER 
begin 

remove one element {v,x,p) from R‘ 

N := LEFt(p); 

for all y such that there is a path of length 2 |p| — 2 from x to y 
do L := (symbol(2;i), . . . , SYMBOL(2;|p|)), where 

zi=xi, z\p\ = y and Z2, . . . , 2;|p|_i are 

symbol vertices on the path from x to y; 

for all s such that 

3w{w G SUCCESSORS(y) A GOTO(STATE(ti;), A^) = s) 
do W {w\w e SUCCESSORS(y) A 

GOTO(STATE(tu), A^) = s}; 

if 3 u{u G Uij A STATE(u) = s) 

then if there is an edge from u to a vertex z 

such that SUCCESSORS(z) = W 
then addsubnode(symbol( 2;),L) 
else create in T a node m labelled N; 
ADDSUBNODE(m, L); 
create in T a vertex z labelled m; 
create in F an edge frome u to z; 
for all w gW 

do create in F an edge from z to it; od 
if u 0 A 

then for all q such that 

reduce q G ACTION (s) 
and q is not an s-production 
do R RU {(u, z, y)} od 

fi 

fi 

else create in T a node m labelled N] 

ADDSUBNODE(m, L); 

create in F two vertices u and z 

labelled s and m, respectively ; 
create in F an edge from u to z; 
for all It; G 

do create in F and edge from z to w od; 

Uij := Uij U {w;}; 

A := A U {it;} 

fi 

od 

od 

end REDUCER; 
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procedure E-REDUCER 
begin 

:= 0 ; 

for all 5 such that 

3(v,p) € Re such that goto (state (?;), left (p)) = s 
do N := LEFT(p); 

create in T a node m labelled N\ 

ADDSUBNODE(m, NIL); 

create in F two vertices u and z labelled s and m, respectively; 
create in F and edge from u to z\ 

'=■ U 

for all {v,p) E Re such that GOTO(STATE(t;), LEFT(p)) = s 
do create in F an edge from x to u od; 

Re 0 ; 

A Uj^i\ 
j j + 1 
od 

end E-REDUCER; 

procedure SHIFTER 
begin 

0 ; 

create in T a node m labelled fli+i; 
for all s such that s) E Q) 

do create in F two vertices x and w labelled s and m, respectively; 
create in F and edge from w to x] 
t^i+1,0 C^t+1,0 U {ty}; 

for all V such that (v,s) £ Q 
do create an edge from x to od 
od 

end SHIFTER; 
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12.6 Pros and cons of Tomita’s algorithm 

We will first review the efficiency of Tomita’s algorithm, and then discuss 
some limitations and extensions. 

Tomita claims his algorithm to be five times faster than Earley’s original 
algorithm [Earley, 1970] and two times faster than the improved version of 
Graham, Harrison and Ruzzo [1980], based on experiments with context-free 
grammars for (parts of) the English language. A worst-case analysis is some- 
what more involved. Earley’s algorithm has O(n^) worst-case complexity for 
a sentence of length n. The worst-case complexity of Tomita’s algorithm de- 
pends on the length of the right-hand side of the grammar. Let q be the length 
of the longest right-hand side of a production. Then the worst-case complexity 
of Tomita’s algorithm is Johnson [1989] gives an argument for this 

complexity based on the number of edges in a packed shared forest for very 
ambiguous grammars. A constructive way to derive this complexity bound is 
the following. 

We can divide the set of state vertices U in the graph-structured stack 
at any time into subsets where k is the number of words that 

has been scanned. Ui contains those state vertices that have been created 
between scanning word i and word z -f 1. The size of Ui is limited by a 
constant (the number of states). Suppose, now, that a reduction has to be 
carried out on a top of the stack u G f/fc, for a production with g right-hand 
side symbols. Then all paths from v with length 2g have to be followed, in 
order to determine the ancestors^ (the vertices onto which the left-hand side 
symbol has to be shifted). 

How many paths of length 2g from v could exist? Because we have merged 
corresponding symbol vertices preceding a state vertex, there is only one edge 
from each state vertex to its preceding symbol vertex. Thus we ignore the 
symbol vertices and move directly from state vertex to state vertex. Retracing 
the right-hand side, we have to move the dot back over all g symbols. When 
the grammar is sufficiently ambiguous, for a state vertex in Uj its successor 
state vertex can be located in any Ui with 0 < i < j. Starting in Uk^ and 
doing this g times, we find O(fc^) possibilities. Hence the total cost for the 
reduction of a vertex in Uk are O(fc^). 

As the size of Uk is 0(1), all reductions in Uk can be handled in O(fc^) 
time. As we have to do this for k ranging from 0 to n, we find a total time 
complexity for all n 4- 1 positions of 

It has been remarked by Kipps [1989] that a Tomita recognizer can be 
constructed with a worst-case complexity O(n^). Using a more sophisticated 

^ These axe called ancestors by Kipps [1989]. Because the edges of the graphs 
point in reverse direction, (cf. Section 12.5, which follows Tomita [1985] in that 
respect), in graph theory terminology these should be called descendants. An- 
cestor is the more appropriate name, it seems, because an ancestor is older (put 
on the stack earlier) than the vertex of which it is an ancestor. 
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graph search algorithm, the 0(n) ancestors of a vertex that has to be reduced 
can be found in O(n^) time. The price for a reduction of the worst-case 
complexity is high, however. On any grammar that is not nearly worst-case, 
the computing time will only increase because of the extra administration 
and the unnecessary sophistication of the graph search algorithm. Also, the 
problem that the packed shared forest may extend beyond 0{n^) is not solved. 
But the same problem applies to Earley’s algorithm when a packed shared 
forest has to be constructed from the completed chart. In order to make 
sure that the size of the forest is O(n^) in the worst case, one can share 
corresponding prefixes of right-hand sides as well; cf. Leermakers [1991] and 
Billot and Lang [1989]. 

From the above discussion it is clear that Tomita’s algorithm is superior to 
Earley and GHR on “easy” grammars, but inferior on “difficult” grammars. 
Tomita claims that all natural language grammars are easy, i.e., almost LR 
and almost e-free. We do not know of an empirical study that has systemat- 
ically tested Tomita’s algorithm against GHR for a large variety of natural 
language grammars. 

Not all context-free grammars can be parsed by Tomita’s algorithm. There 
are two classes of grammars for which the algorithm doesn’t finish: cyclic 
grammars and hidden left-recursive grammars. We will briefly discuss each 
case. 

A grammar is cyclic if A=>'^A for some nonterminal A E N. The problem 
is clear: whenever an A is put onto the stack, no further shift takes place as 
the algorithm doesn’t stop reducing ever more A’s. 

A more subtle class of grammars that busts the algorithm are hidden 
left-recursive grammars.^ A grammar is hidden left-recursive if there are 
A, a, (3 such that 

(0 A=^*BaA(3, 

(ii) Ba=>*e. 

When P=>*e the grammar is cyclic, but in general it is not necessarily the 
case that /3 rewrites to e. Consider the grammar, defined by the productions 

{♦S — yASbj S — A — ^6:}. 

The parser sees an a as the first word. How many times should A-^e be 
reduced before we do the first shift? In order to deal with arbitrary sentences 
of the form a6*, an infinite amount of shifts is needed. This is reflected by 
the parsing table for this grammar, which remains in the same state after 
reducing A-^e. 



^ The term hidden left-recursive is due to Nederhof [1993]. Nozohoor-Farshi [1989] 
called such grammars ill-formed, for want of a better word. In [Lankhorst and 
Sikkel, 1991], [Sikkel and Lankhorst, 1992] we called them pseudo -cyclic. 
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One could wonder whether hidden left-recursive grammars are relevant to 
natural language parsing. Nederhof and Sarbo [1993] report to have found a 
grammar for Dutch, the Deltra grammar developed at the Delft University 
of Technology [Schoorl and Beider, 1990], that has a hidden left-recursive 
context-free backbone. 

The problem with hidden left-recursive grammars, which was overlooked 
by Tomita [1985], has been solved by Nozohoor-Farshi [1989]. He introduces a 
cycle in the stack which can be unrolled as many times as needed. A more fun- 
damental solution is proposed by Nederhof and Sarbo [1993]. They leave the 
stack acyclic and make it optional whether the stack contains nullable right- 
hand side symbols in a reduction. Rekers [1992] has eliminated the problem 
of hidden left-recursion in yet another way, by optimizing the sharing of the 
graph-structured stack. The infinite sequence of A’s, all describe the empty 
string at position 0. Hence, in Rekers’ optimally shared stack, an infinite 
sequence of state vertices that would be generated by Tomita collapses into 
a single state vertex. Like the algorithms of Nederhof and Sarbo [1993] and 
Rekers [1992], the The PBT algorithm that will be discussed in Chapter 13 
can deal with arbitrary (reduced) context-free grammars. 

Generalized LR parsing has been extended to context-sensitive grammars 
by Harkema and Tomita [1991]. Other papers on Tomita’s algorithm can be 
found in [Tomita, 1991] and [Heemels et al., 1991]. 



12.7 An annotated version of Tomita’s algorithm 

We annotate Tomita’s parse stack with Earley items. For a fair comparison 
with the Earley chart parser, we use a generalized LR(0) parser, without 
look-ahead. In Section 12.8, subsequently, we will define parsing schemata 
for LR(0) and SLR(l), based on the items with which the stack is annotated 
here. 

The canonical Tomita parser is based on (generalized) SLR(l). We start 
with a slightly different Tomita parser, based on LR(0), because for this one 
it is easiest to derive a parsing schema. Moreover, the LR(0) Tomita parser 
is the basis for constructing the parallel Tomita parser in the next chapter. 

There are a few subtle difference between LR(0) parsers on the one hand 
and all other LR parsers on the other hand. No look-ahead is used, hence the 
type of the next action is determined only by the top state of the stack. If 
shift is a possible action, the next state depends also on the particular symbol 
that is shifted. To that end, the goto table covers nonterminal and terminal 
symbols alike. Whenever a symbol is pushed onto the stack, the combination 
of state and symbol determines the next state. The error action no longer 
exists now. From the construction of the parsing table it follows that each 
state has some valid action. Errors occur, however, when a shift is done but 
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Fig. 12.20. An annotated LR(0) parsing table for 



there is no next state for the symbol that is shifted. Then the shift is cancelled 
and the branch of the stack on which a shift was tried can be removed. An 
annotated LR(0) parsing table for grammar G 4 is shown in Figure 12.20. 
Note that the accept is in fact disguised as a shift. If a shift is decided upon 
and the goto table yield acc, the parser moves to a special accept state that 
is not shown in the parsing table. Alternatively, one could explicitly include 
a state {5'->S'$.} and offer accept as the only possible action in that state. 

The class of deterministic SLR(l) grammars is strictly larger than the 
class of deterministic LR(0) grammars. This is exemplified by G 3 (cf. Fig- 
ure 12.1 on page 257). The SLR(l) table has no ambiguities. In an LR(0) 
table, state 5 would offer both sh and re5. Without look-ahead one cannot 
deterministically decide whether the verb has a direct object. 
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Having introduced the annotated LR(0) parsing table, we can now give an 
explicit correspondence between the parse stack and LR(0) items on the one 
hand and Earley-type items on the other hand. The latter ones, having the 
general format are called marked LR(0) items in this context. 

We will first introduce an annotated LR(0) Tomita parser that incorporates 
marked items into its parse stack, and then derive a parsing schema for the 
domain of marked items that is implemented by an LR(0) Tomita parser. 

Let G be a context-free grammar and G' its augmented grammar. The 
set of marked items for G' is defined by 

^LR(o) — {[A— i,j] \ A-^a0 ^ P’ AO < i < j}. (12.17) 

The graph-structured stack can be described as a bipartite directed graph 
r = {U,Y; E), where U is the set of state vertices, Y the set of symbol 
vertices, and E the set of edges connecting vertices to one another. For the 
sake of simplicity, we run the algorithm only as a recognizer. Hence, symbol 
vertices are labelled with grammar symbols and no parse list is produced. We 
write symbol(2/) for the label of a symbol vertex y eY. We write state(u) 
for the state with which a state vertex u E U is labelled. The set of state 
vertices U that is used for parsing a sentence a\ .. .an can be partitioned into 
Lo U . . . U f/n* The subset Ui contains those state vertices that are put onto 
the stack when the words ai_|_i . . . remain on the input. 

The Annotated LR(0) Tomita algorithm is obtained from the LR(0) 
Tomita algorithm by two simple changes in the way the stack is maintained. 
Firstly, when a reduction is carried out there is no need to delete the part of 
the stack that is being reduced. We can simply leave it in the graph and start 
a new branch from the appropriate state vertex. It is remarked by Tomita 
[1985] that this does not change the algorithm in any way (and in fact Tomita 
doesn’t prune branches of the graph either), only the presentation of what a 
graph-structured stack looks like is different. 

Secondly, we will label the state vertices with sets of marked items, de- 
noted iTEMS(u) for any u E Uj C U. For every LR(0) item A->a*l3 E 
state(u), we add one (sometimes a few) marked item [A— ^a./?, i, j] to 
iTEMS(u). We have to determine, however, which position markers should 
be contained in the marked item. This is done as follows. 

• The right position marker corresponds to the subset Uj of U. 

That is, if [A-^a^/3,iJ] E ITEMS (u) then u eUj. 

• For initial items, the left and right position marker coincide. 

That is, if [A->./3,z, j] E items(u) then i = j. 

• For non-initial items, the left position marker is determined as follows. 

Let A-^aX*(3 E state(u) then u is the predecessor^ of a symbol vertex y 

We assume here that edges are directed from right to left, i.e., from the tops of 
the stack towards the root. Because of the way in which the stack is constructed 
(and the standard way to depict a stack with the root at the left and the tops at 
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5-f5.PP,0,4 
PP->. *p NPAA 



Fig. 12.21. Annotated stack for I saw a man. . . 



with SYMBOL (?/) = X. For each state vertex v that is a successor of y it 
holds that items(u) contains some Earley item [A— z, A:]. 

For all successors u of y and for all values of i such that [A— z, A:] G 
iTEMS(u), an Earley item [A->X«/3, z, j] is added to items(u). 

As output of the annotated Tomita parser we will consider the marked LR(0) 
items that appear in the final graph-structured stack, rather than the parse 
list. In Figure 12.21 and Figure 12.22 the annotated graph-structured stack 
is shown for “I saw a man with a telescope”. 

Definition 12.1. (LR(O)-viable items) A marked LR(0) item [A^a.^,z,i] 
is called LR(0) -viable for a string ai .an if, there is some z G X* such that 

(z) . . . aiAz$, 

[ii) a=>*ai+i . . . Uj. □ 

In the sequel we will prove that a final stack of the annotated LR(0) Tomita 
parser contains all viable marked items and no other ones. But first we reca- 
pitulate (in a much simplified form) the essential notions of parsing schemata 
and parsing systems. 

the right) this seems the wrong way around. This “reversed” direction of edges 
is chosen because of some implementations details that do not matter right here. 
We stick to this terminology here to be compatible with the formal definition of 
Tomita’s algorithm that has been presented in 12.5. 
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12.8 Parsing Schemata for LR(0) and SLR(l) 

A parsing system for some grammar G and string ai ... an is a triple P = 
(I, H, D) with 1 a set of items, H an initial set of items and D a set of 
deduction steps that allow to derive new items from already known items. 
The set of initial items H encodes the sentence that is to be parsed. For a 
sentence ai . . . an we take 

H = {[ai,0, 1], ..., [an,n-l,n], [$,n,n H- 1]}. (12.18) 

The item set X for an LR(0) parsing system has been specified in (12.17) on 
page 280. Deduction steps in D are of the form 
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The items t/i , . . . , are called the antecedents and the item ^ is called the 
consequent of a deduction step. If all antecedents of a deduction step are 
recognized by a parser, then the consequent should also be recognized. An 
item [A-^am/3,i,j] € X is valid in P if it can be recognized from the initial 
set H by applying a sequence of deduction steps. 

A parsing system P is defined for a particular grammar and string. An 
uninstantiated parsing system only defines I and D for a particular grammar 
G. Such a system can be instantiated by adding a set of hypotheses for a par- 
ticular string ai . . . On- A parsing schema is defined for a class of grammars. 
For each grammar in this class, it defines an uninstantiated parsing system. 

Let us now define a parsing schema LR(0), abstracting from all the al- 
gorithmic details of an annotated LR(0) Tomita parser. The schema is de- 
fined for reduced acyclic context-free grammars without hidden left-recursion. 
We specify the parsing schema by defining a parsing system ^lr(o) = 
{^LR(0), H, Dlr(o)) for an arbitrary grammar G and string ai . . . a^. Xlr(o) 
and H have already been defined above, so we only have to determine the 
set of deduction steps Dlr(o) l^hat is implemented by our annotated Tomita 
parser. D can be divided into distinct subsets. 

Initial LR(0) items are contained in a state of the parser because they are 
contained in the closure of some non-initial item. Similarly, initial marked 
items in ITEMS (u) of some state vertex u can be computed by a closure 
operation on marked items. The set of deduction steps that describes all 
closures is specified by:^ 

= {[A^a.BI5,i,j] h [5-^.7, (12.19) 

If the string 7 in (12.19) starts with a nonterminal, then [jB->. 7 ,j, j] is the 
antecedent of another closure step. Hence we do not need to specify explicitly 
that the transitive closure has to be taken, as in the algorithm in Figure 12.2. 

In order to start the parser, we need an initial deduction step without 
antecedents: 

= { f- [5' -4.5$, 0,0]}. (12.20) 

The other marked items of the initial vertex uq can be deduced from 
[5'->.5$,0,0] by deduction steps in 

A shift action is feasible in a state that contains an LR(0) item A-^a.aji. 
I.e., a shift is possible from a state vertex having a marked item [A-^a.a/?, z, j]. 
The shift is successful for this particular item only if the next word of input 
is indeed a. Thus we obtain the set of shift deduction steps: 



^ A remark on the notation of (12.19): all items that occur in a deduction step 
must, by definition, be taken from X or H. Hence conditions like, e.g., B— >>7 G P' 
need not be stated again. So, in this case, the entire right part of the usual set 
notation {. . . | . . .} is absent. 
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= {[A-^a.al3,i,j],[a,j,j + 1] f- [A-^aa.pjJ + l]}. (12.21) 

Finally we turn to the most difficult case, the reduction. A reduce action is 
possible in a state that contains a final LR(0) item, i.e., a reduce is possible 
from a state vertex that contains a final marked item. Let E 

STATE(u) for some w, with 7 = . . .X^. Then we can retrace a path of 

symbol and state vertices labelled with (among others) 

Xfc, z, jib-i], Xi, [jB->.Xi , . . Xfc, z, z]. 

Let V be the vertex such that [5-^.7, z,z] G iTEMS(z;). Then there must be 
a non-final marked item in the same item set such that [.B— >.7,z,z] can be 
derived from it by closure steps. Assume 

[A->a.B/3,h,i] G items(u) 

Then from v we have to extend the stack with a symbol vertex labelled 
B that has v ais its successor and a predecessor state vertex w such that 
[A-^aB»l3,h,j] G lTEMS(zz;). All the intermediate vertices that were retraced 
in order to find v are not essential for the reduction.® Hence, the essential 
properties of a reduction are covered by the set of deduction steps 

= {[A->q;.B/ 3, h,z], [B->7 .,z, j] f- [A->aB./3, /z, j]}. (12.22) 

Now we have enumerated all deduction steps that specify how marked items 
are added to the graph-structured stack of the annotated LR(0) Tomita 
parser. This is summarized in the following parsing schema. 

Schema 12.2. (LR(0)) 

The parsing schema LR(0) is defined for reduced acyclic context-free gram- 
mars without hidden left-recursion. Let G be such a grammar and G' the 
augmented grammar. A parsing system 

^LR(O) = {^LR(0)^H,Dlr(0)) 
is defined by 

^LR(o) = {[A->q;./ 3, z, j] I A— >a/3 G P' A 0 < z < j}; 

Dinit ^ [5/->.5$,0,0]}, 

= {[A^a.B0,i,j] h 

= {[A^a,aP,i,j],[a,j,j + 1] h [A^aa,0,j,j + l]}, 

® These vertices are not essential in then sense that they provide merely a data 
structure that allow to retrieve vertices v satisfying [A->a.R/9, h, z] G items(u). 
Data structures are abstracted from in parsing schemata, hence the steps that 
need to be taken to find such vertices v do not show up in the deduction step. 
Searching the intermediate vertices is essential for the (in)efficiency of Tomita’s 
algorithm when a massively ambiguous grammar with long right-hand sides is 
used, cf. Section 12.6. I.e., for such grammars a graph-structured stack is an 
inefficient implementation of the schema LR(0). 
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Dlr(o)= 

The set of hypotheses H depends on the input string, cf. (12.18) on page 282. 

□ 

It is not a coincidence that this schema is very similar to the schema Ear- 
ley defined in Example 4.32. The predict scan and complete deduction steps 
in the Earley schema correspond to the closure^ shift and reduce steps here. 
There are only two inessential differences between the parsing schemata Ear- 
ley and LR(0): 

• Earley is defined for all context-free grammars, whereas LR(0) is only 
defined for reduced acyclic grammars without hidden left-recursion. 

• LR(0) augments the grammar with an extra production S'^S$. 

Corollary 12.3. 

A marked LR(0) item is valid in LR(0) for some grammar G and sentence 
ai . . . ttn if and only if the item is LR(0)-viable for G and a\ . . .On (cf. Defi- 
nition 12.1 on page 281). □ 



Next, we will define a parsing schema for (generalized) SLR(l) by exam- 
ining the differences between the LR(0) and SLR(l) Tomita parser. We have 
not defined an algorithm for the construction of an LR(0) table, but is it 
clear from the examples how this should be done. The relation between the 
SLR(l) and LR(0) tables is characterized as follows. 

• sh s' E action Ln(o)[s] if if ^ (action sLRfi)[SyO] for some 

a E \ 

• re k E action if only if re A: € action slr{i)[sj o] for some a E 

• s' E gotOiji^Q)[s,a] if and only if shs' E action slr(i)[s,o]; 

• s' E gotOLR(o)[s,A] if and only if 5' € gotosLR(i)[s, 

This leads to the following differences for the parsing schemata: 

• The closure deduction steps are identical, as the construction of the set of 
states is not affected. 

• The shift deduction steps are identical. When the LR(0) parser decides to 
shift, this will only lead to a new entry in the stack if the goto table yields 
a new state for the shifted terminal. 

• There is a difference in reduce deduction steps. In the SLR(l) case, a re- 
duction is carried out only if this is licensed by the look-ahead symbol. In 
grammar G3, for example, as defined on page 256, the SLR(l) parser will 
reduce a by a production VP^*v only if it is followed by an end-of- 
sentence marker. The LR(0) parser always reduces *v to VP. 

These observations are laid down in the following parsing schema. 
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Schema 12.4. (SLR(l)) 

The parsing schema SLR(l) is defined for reduced acyclic context-free gram- 
mars without hidden left-recursion. Let G be such a grammar and G' the 
augmented grammar. A parsing system fsLR(i) - {^slr(i),H,Dslr(i)) is 
defined by 

^SLR(i) = I A~^a0 E P' AO < i < j}] 

jjinit [ 5 '-)^. 5 $, 0 , 0 ]}, 

= {[A-^a.apJJl [aJJ + 1 ] h [A~^aa.p,iJ + 1 ]}, 

= {[A-^a.B/3,hJ], [B-^j.,iJ], [aJJ + 1 ] 

h [A-^aB.p,hJ] I a G Follow(R)}, 

Dslr(i)-- 

The set of hypotheses H depends on the input string, cf. (12.18 on page 282.) 

□ 



Note that is it possible to exploit the look-ahead more efficiently, for example 
by using a € First(^ FoLLOw(A)), rather than a € Follow(R) to filter 
irrelevant items. Also, one could apply a filter to the scan steps. But the 
schema has been defined such that incorporates exactly the same look-ahead 
that is used in the construction of an SLR(l) parsing table. 



We call an item SLR(l)-valid for a grammar G and string ai . . . Un if it 
is a valid item in the SLR(l) parsing system for G and a\ .an > 

A characterization of the set of valid items is somewhat more involved in 
the SLR(l) case than in the LR(0) case. We define a set of viable items, the 
items that ought to be recognized, and then sketch a proof that this equals 
the set of valid items. 

Definition 12.5. {SLR(l) -viable items) 

The set of SLR(l)-viable items (or viable items for short) is the smallest 
subset of IsLR(i) satisfying the following conditions: 

• [5' -^.5$, 0,0] is viable; 

• [A-4aa./?,z, j 4- 1] is viable if 

(i) . . . a{Az% for some 2 , 

(a) a=>*ai+i . . .aj; 

• [A->aB^/3,ij k] is viable if there are ^ and j < k such that 

(i) S'=>*ai...aiAz$, 

(ii) a=>*ai^i ...aj, 

(izz) B^^ aj~^\ . . . ? 

(iv) afc+i e Follow(R); 
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• is viable if 

(i) . . . aiAz% for some 2 ;, 

(iz) ol — . . . Oij ^ 

{in) B=>*e, 

(iv) j] is viable, 

(t;) aj^i e Follow(B); 

• [C'->. 7 ,t,j] is viable if there is a viable item [A—^a*B/3,i,j] such that 

B=>'!^^CS for some S7 □ 

The recursion in this definition only relates to nullable symbols (i.e., B such 
that B=>*e). These have to be taken proper care of. Consider, for example a 
grammar G defined by productions 

S — yABAj S — yCBCj A — yo,, B — ye, C — yc 

and an input string ac. Then [5->A.B.A, 0, 1] is not viable, even though 
conditions (z), (zz), {Hi) and {v) of the second last bullet are satisfied. The 
deduction step 

[S^A.BA, 0, 1], 1, 1], [c, 1, 2] h [S-^AB.A, 0, 1] 

is never activated because 0, 1] cannot be recognized; reduction 

of the first A is prevented by the look-ahead c. 

Theorem 12.6. {SLR(l) -validity) 

An item in Xslr(i) is SLR(l)-valid if and only if it is SLR(l)-viable. 

Proof (sketch).® 

(z) An SLR(l)-valid item is SLR(l) -viable: 

It has to be verified that every deduction step with hypotheses and/or 
viable items as antecedents has a viable consequent. This can be checked 
straightforwardly for each of the different kinds of deduction steps. 

(zz) An SLR(l)-viable item is SLR(l)-valid: 

With the same technique that was applied in Chapters 10 and 11 one 
can define a walk length function on viable items. For each viable item 
(except the initial item) a deduction step can be found such that the 
antecedents are hypotheses and/or viable items with a strictly lower 
walk length value. Hence, by induction on walk length, each viable item 
is shown to be valid. □ 



^ denotes rightmost derivation. 

® There is not much point in repeating, with different details, the somewhat lengthy 
argument in Section 10.3. See [Sikkel, 1995, forthcoming] for a general approach 
to these kind of proofs. A complete proof of Theorem 12.6 is given in [Sikkel, 
1995]. 
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12.9 Conclusion 

We have derived some parsing schemata for (Generalized) LR parsers. Similar 
schemata for SLR(fc), canonical LR(fc) and LALR(fc) can be added in the 
same fashion. In this way we have shown that parsing schemata can be used 
to describe parsing algorithms that are quite different from chart parsers. 

The LR parsing schemata show the close relation between Generalized 
LR parsing - in particular Tomita’s algorithm - and the conventional Ear- 
ley parser. A more rigorous approach, in which the Earley parsing schema is 
transformed into a pushdown-automaton is given in [Sikkel, 1995, forthcom- 
ing]. Here it is insight, more than formal proof, that interested us. 

In the next chapter we will exploit the relation between the parsers of 
Earley and Tomita for the definition of a parallel Tomita parser, obtained by 
cross-fertilizing Tomita’s algorithm with a bottom-up parallelization of an 
Earley parser. 




13. Parallel Bottom-up 
Tomita parsing 



In the previous chapter we have derived the parsing schema LR(0) and con- 
cluded that the differences with Earley are trivial details. Hence there is a 
structural correspondence between Earley chart parser and generalized LR 
parsers. This correspondence can be used to cross-fertilize different variants 
of either kind algorithm. A particularly interesting example that we will dis- 
cuss here is the Parallel Bottom-up Tomita (PBT) algorithm [Lankhorst and 
Sikkel, 1991], [Sikkel and Lankhorst, 1992], where the conventional paral- 
lelization of Earley’s algorithm is applied to the Tomita parser. 

The PBT algorithm improves upon the canonical Tomita parser in sev- 
eral respects. Only a theoretical advantage is that it works for all (reduced) 
context-free grammars and obtains optimal sharing in the parse forest. An 
interesting practical property for large grammars is that parsing tables are 
small and can be computed in linear time. PBT has been implemented and 
empirically tested against Tomita’s algorithm. It turns out that PBT is faster 
for long sentences and slower for short sentences; it is difficult to give a break- 
even point. Even though the speed-up is not overwhelming, we see this as a 
moderately positive result. The algorithm works^ and has some theoretical 
advantages over the canonical Tomita parser. And, more important in the 
setting of this book, it shows that it is possible to design novel, useful algo- 
rithms by cross-breeding different algorithms with related underlying parsing 
schemata. 

In 13.1 we define a parsing schema PBT that relates to LR(0) as buE 
relates to Earley. The basic algorithm is explained in 13.2 and a more effi- 
cient variant in 13.3, followed by the construction of the (distributed) parse 
list in 13.4. A formal specification of the PBT algorithm is presented in 13.5. 
In 13.6 the empirical test results are reported on. A brief overview of related 
approaches is given in 13.7, followed by conclusions in 13.8. 



^ It is very hard, if at all possible, to predict theoretically how communication 
bottlenecks and uneven load distribution will degrade the performance of an 
algorithm that looks nice on paper. See Thompson [1989], for example, for a 
parallel parser that gets slower the more processors are used for the job. 
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This chapter is based on cooperative work with Marc Lankhorst. A full 
account of the PBT parser is given in [Lankhorst and Sikkel, 1991], and 
overview has been published as [Sikkel and Lankhorst, 1992]. 



13.1 The PBT parsing schema 

The obvious way to make a parallel implementation of a Tomita parser is 
to allocate each stack to a different process. Two such implementations, in 
a parallel logic programming language, have been presented by Tanaka and 
Numazaki. Maintaining a graph-structured stack would require too much syn- 
chronisation, therefore they work in parallel on separate copies of linear stacks 
[Tanaka and Numazaki, 1989], or with tree-structured stacks [Numazaki and 
Tanaka, 1990]. A similar line of parallelization is followed by Thompson, 
Dixon, and Lamping [1991]. They modify a nondeterministic shift/reduce 
parser in such a way that 0{n) time complexity is obtained if there are 
enough resources to fork off a separate process for each ambiguity. We look 
at the problem of Generalized LR parsing from quite a different angle. One 
could say that our view is perpendicular to the above approaches. 

A straightforward parallel version of Earley’s algorithm is obtained by 
discarding the top-down filter. This eliminates the need to parse the sentence 
in left-to-right fashion, one can start parsing at each word of the sentence in 
parallel; cf. Section 4.6 where the bottom-up Earley schema buE has been 
defined. In a similar vein, we will delete the top-down prediction from Gener- 
alized LR parsing, and define a Tomita-like parser with an underlying pars- 
ing schema that is almost identical to buE. Our Parallel Bottom-up Tomita 
{PBT) parser will not use look-ahead; it can be seen as a parallelization of 
the LR(0)-based Tomita parser. 

Schema 13.1. (PBT) 

The parsing schema PBT is defined for all reduced context-free grammars (cf. 
page 254). Let G' be the the augmented grammar of some reduce grammar 
G. A parsing system Fpbt = {^pbt^H^Dpbt) is defined by 

Tpbt = A-^aP £ P' AQ <i <i)] 

H [A^.a,j,j]}, 

= {[A-^a.a0,i,j],[a,j,j + l] \- [A-^aa.0,i,j + 1]}, 

= {[A^a.B0,h,i],[B-^'y.,i,j] h [A^aB.0,h,j]}, 

DpBT = U u 

The main difference between PBT and buE is the use of the extra production 
with which the grammar has been augmented. Furthermore, buE is 
also defined for non-reduced context-free grammars. □ 
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13.2 A PBT parser 

We will define a Tomita-like parallel parsing algorithm that implements the 
PBT schema. In fact we only define a recognizer here, similarly to the an- 
notated version of Tomita’s algorithm. The architecture of the PBT parser 
comprises a sequence of of processes Pq,... communicating in a pipeline. 
See Figure 13.1. Each process computes its own part of the (distributed) parse 
list. But we will defer construction of the parse list until Section 13.4. If less 
than n processors are available for parsing a sentence a \ . . . On, then several 
processes can be shared by a single processor. The task of a process Pi is to 
recognize all constituents that start at position i in the sentence. 




d\ Cb2 • * * $ 



Fig. 13.1. A pipeline 
of processes 



For technical reasons, recognized constituents will always be tagged with 
position markers. We write (z, X, j) for a constituent X that spans the sub- 
string ai+i . . ,aj of the sentence. We use angular brackets rather than square 
brackets so as to underline the difference with marked LR(0) items. It is more 
convenient to start a marked symbol with the left position marker for reasons 
that will become clear in Section 13.4. 

Marked items are used only in the annotated versions of Tomita-like 
parsers and can be disposed of. Marked symbols, on the other hand, are essen- 
tial for the algorithmic details of the PBT parser. Whenever a constituent is 
recognized by some process P{ it is passed down the pipeline in leftward direc- 
tion. If, for example, P{ has recognized a prepositional phrase (z, PPJ), then 
some other process Ph, having recognized a noun phrase (/z, NP, i) might pick 
it up and construct a composite noun phrase {/i, NP,j) using the production 
NP-^NPPP. 

Each process runs and adapted version of a Tomita parser and creates its 
private graph-structured stack. Process Pi starts with recognizing its “own” 
word ai and delivers a constituent (z, a, z + 1) down the pipeline. Subsequently, 
it reads a stream of symbols from its right neighbour, takes appropriate ac- 
tions, and sends the stream of symbols to its left neighbour. For each con- 
stituent that is passed down the pipeline, Pi tries whether it fits somewhere 
onto its graph-structured stack. If so, the stack is expanded with a symbol 
vertex and a state vertex. If the new state vertex allows a reduction, the re- 
duced symbol is added to the stack and inserted into the stream of symbols. 
The last symbol in the stream is (n, $, n -f 1). Process Pi terminates after the 
end-of-sentence marker has been read and passed on. 




292 



13. Parallel Bottom-up Tomita parsing 





LR{0) items 


action 


1 goto 1 










S 




PP 


VP^ 




0 


S-^.NP VP 
NP->.*det *n 
NP-^. *n 
NP-^.NP PP 
PP— >. *prep NP 
FP->. *v NP 
FP->. VP PP 


sh 


4 


5 


6 


7 


1 


2 




3 




1 


S ' — 


sh 


















acc 


2 


S-^NP. VP 
NP->NP.PP 


sh 














9 


8 




3 


VP-^VP.PP 


sh 














10 






4 


NP-> *det» *n 


sh 




11 
















5 


NP^*det.*n 


re 3 




















6 


PP— >■ *prep»NP 


sh 












12 








7 


VP-^*v.NP 


sh 












13 








8 


S-¥NP VP. 


rel 




















9 


NP-^NP PP. 


re 4 




















10 


yp-> VP PP. 


rel 




















11 


NP-^*det *n. 


re 2 




















12 


PP—y*prep NP. 


re 5 




















13 


VP^*v NP. 


re 6 





















Fig. 13.2. An annotated PBT parsing table for 



We will first look at an example and give a specification of the differences 
between the LR(0) algorithm and PBT afterwards. The example makes use 
of a slightly different grammar G5 : 

(1) S-^NP VP (5) PP-^ *prep NP 

(2) NP-^ *det *n (6) *v NP 

(3) NP-^*n (7) VP-^VPPP. 

(4) NP-^NPPP 

The difference between G4 and G5 is that a PP on sentence level is attached 
to the VP rather than to the S symbol. There is no linguistic motivation (as 
for all the example grammars), the purpose of this change is simply to allow 
for a better example. 

In order to show the distributed nature of the PBT algorithm, we single 
out one specific process and trace its behaviour on the example sentence “I 
saw a man with a telescope.” We will focus on proces P\ that is to recognize 
all constituents starting with the second word “saw.” The adapted parsing 
table is shown in Figure 13.2. We will first follow the example and discuss 
the construction of the parsing table afterwards. 
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The stream of symbols that is read from P 2 in due course^ is 

(2, ATP, 4), (4,PP,7), (2,iVP,7), (7,$,8). 

We start with an empty stack, represented by a single state vertex labelled 
0. First, Pi’s terminal symbol (1, *v,2) is shifted. A symbol vertex and state 
vertex are added to the stack as usual. No reduction can be made, so we 
read (2, ATP, 4) from the pipe. In state 7 this can be shifted. The new state 
is 13, requiring action re VP-^ *v NP. So we create a symbol vertex labelled 
(1, yP, 4) and start a new branch of the stack from the state vertex preceding 
(1, *v^2). The new state is 3. The stack that has been created so far is depicted 
in Figure 13.3. For the sake of clarity the state vertices are grouped into 
subsets Uj with j the right position marker of the preceding symbol. In the 
PBT algorithm it is essential that branches of the stack are not pruned. As 
we will see in the sequel, the vertex in state 7 in U 2 will be used to shift 
another NP onto. 




Fig. 13.3. The stack after 
reducing (1, VP, 4) 



The next symbol that appears in the stream is (4, PP, 7). This shifted in 
state 3 (at position 4) and (1, VP,4)(4, PP, 7) is reduced to (1, VP, 7). Note 
that (4, PP, 7) could not be shifted from state 13 - there is no entry in the 
goto table - although (2, AP, 4)(4, PP, 7) is reducible to a compound NP. 
This is because Pi only creates new symbols that start at position 1. As we 
read the next symbol, it turns out that (2, ATP, 7) has indeed been created 
by P 2 . It is shifted at position 2. Subsequently we can reduce a verb phrase 
(1, yP, 7). This symbol is already present in the stack and need not be added 
again. The last symbol, (7, $,8), cannot be shifted anywhere. It also signals 
the end of the stream, hence Pi has finished its task. The final parse stack of 
Pi is shown in Figure 13.4. 

Symbols are sent on to the left neighbour as soon as they are read or 
created, in order to minimize waiting time. Some ordering requirements must 
be made, however, so as to guarantee the correctness of the algorithm. When 
a process has to decide whether the next symbol (i,X,j) fits anywhere onto 

^ This is in fact an optimized version of the algorithm. In a more simple version, all 
symbols recognized by all processes P 2 , . . . , P 7 pass through Pi . In the optimized 
version, a symbol is discarded by some process Pi if it can be argued that none of 
the processes Po, . . . Pi_i can use it, irrespective of the categories of their words 
ai, . . . This will be discussed in more detail in 13.3. 
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Fig. 13.4. The final stack of Pi 



the stack, it is essential that all symbols {kj F, 1 ) with k < I < i must have 
been received and, if necessary, added to the stack. For symbols with i < j 
this is no problem. Whenever a symbol (i, X^j) causes a reduction at process 
Ph with h < 2, then the reduced symbol {h,Y,j) is inserted into the stream 
directly after (2, X, j) and the ordering constraint is kept automatically. Some 
care must be taken in case of e-productions, however. In order to guarantee 
that all state vertices onto which a symbol can be shifted are created before 
the symbol arrives, we have to ensure the following conditions: 

• A symbol of the form (j, X\j) must precede all symbols {j, F, k) with j < k. 

• All symbols of the form {2, F, j) with i < j must precede a symbol (j, A, j). 

• Symbols of the form {j,X,j) and must precede each other. 

The first two conditions are easy to satisfy. Above we have given a slightly 
oversimplified description of the algorithm. Before the “own” terminal symbol 
is processed, Pj carries out all reductions of nullable constituents at position 
j. The third condition is rather more awkward. A nullable symbol has to be 
re-tried for a shift after other nullable symbols at the same position have 
been received. 

For grammars where large subtrees can be rewritten to e, one could pre- 
compute all nullable symbols, start each process with this pre-computed 
stack, and also pre-compute an order in which (possibly multiple copies of) 
nullable symbols have to be sent down the pipe. In that case the third con- 
dition can be dropped and some work of each process is done compile-time 
rather than run-time. We have not added such sophistication to our imple- 
mentation, however. For natural language grammars this is hardly an issue. 
We did implement a simplification of the reduce action. Rather than carrying 
out a proper reduction, a recognized symbol is pushed back onto the input 
and subsequently shifted just like any other symbol. 

The construction of the PBT parsing table is in fact much simpler than 
the construction of any LR table. It is easy to prove that the number of states 
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function nextstate{I: set of items, X: symbol): set of items; 

begin 

if / = and X = % 

then next.state := accept 

else next-state := {A-^aXm/3 | A—^a»Xl3 £ 1} 

fi 

end; 

function alLstates: set of sets of items 
begin 

So •= {A— >.a I A— >a € P'}\ 

C := {so}; 

while there is a state I £ C and a symbol X £V' 

such that next-state{I , X) ^ 0 and nextstate{I , X) 0 C 
do C := CU {next-state {I , X)} od; 
alLstates := C 

end; 

procedure construct PBT table 
begin 

C := all-states; 
for each I £ C 
do action[I] := 0; 

for each X £ V' do goto[I,X] := error od; 
for each item £ I 
do case item of 
A — yot»ci(3: 

action[I] := action[I\ U {shift}; 
goto[I,a] := nextstate{I , a) 

A — 

goto[I,B] := next-state {I , B) 

A — 

action[I] := action[I] U {reduce A->a} 

esac 

od od 

end; 



Fig. 13.5. computation of the PBT states and parsing table 



is 0(|G|), i.e., linear in the size of the grammar.^ If only non-empty entries 
in the goto table are represented, the size of the parsing table is 0(|G|). And, 
more importantly, computing the table take 0(|G|) time. 

The cause of this simplicity is the absence of the notion of a closure. 
This is because Pi only has to recognize constituents starting at position i. If 
(in the annotated version) an item [A-4a.jB/3, z, j] has been computed, with 
i < j, there is no need to start parsing B. This is the task of process Ij. If 
such a B exists, it will simply arrive through the pipeline. An algorithm for 
computation of the PBT parsing table is presented in Figure 13.5. 



See (11 A) on page 239 for a definition of |G|. 



3 
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The differences between PBT and the LR(0) Tomita parser can be sum- 
marized as follows. 

• Every process Pi runs an adapted parsing table without look-ahead defined 
by the algorithm in Figure 13.5. 

• The algorithm that is run by each process does not synchronise on shifts; 
therefore the ordering requirements as stated on page 294 must be obeyed. 

• Two position markers are tagged onto each recognized symbol, in order to 
keep track of the substring that is spanned by the symbol. 

• On a reduce it is not allowed to prune the reduced branch of the stack. 

A complete specification of the PBT algorithm, compatible in style with 
the specification of Tomita’s algorithm, is given in Section 13.5. For acyclic 
grammars without hidden left-recursion, it is straightforward to verify that a 
state vertex u can be annotated with a set of marked LR(0) items items (w) 
and that the annotated PBT algorithm duely implements the PBT parsing 
schema. 

Surprisingly, perhaps, the PBT algorithm also works for cyclic and hidden 
left-recursive grammars. We will come back to this in Section 13.4, v^here we 
discuss the construction of a parse list. 



13.3 A more efficient PBT parser 

The PBT algorithm as discussed above suffers from some inefficiency. Most 
recognized symbols can be used only locally and it may easily lead to a 
communication bottleneck if every symbol is passed down the entire pipeline. 
In the example on page 293, only four symbols were received by P\ : two NPs^ 
a PP and an end-of-sentence marker. Filtering of useless symbols had been 
applied there already. Without such a communication filter, P\ would receive 
the following stream of symbols: 

{2,*det,3), (3, V4), (3,iVP,4), (2,iVF,4), (4, Vep,5), 

(5,*de^,6), (6,^7), (6,7VP,7), (5,iVF,7), (4,PF,7), 

(3,iVP,7), (2, ATP, 7), (7,$,8). 

The majority of these symbols can be discarded higher up in the pipeline. We 
will define two criteria to detect that a symbol is useless for the remainder of 
the pipeline and should be discarded. 

The first case is simple. Consider a symbol X £V that appears only as the 
first symbol in left-hand sides of productions. In such a case, a symbol (z, X, j) 
can only be used by process P{ and by no other process. As an example, 
consider *det, which only appears in the production AP-> *det *n. When P 5 
finds a determiner (5, *det, 6 ) it can only contribute to the recognition of NPs 
starting at position 5. Hence, it need not be sent on to P 4 and further down. 
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Formally, 

• communication savings rule I: 

Pi writes a symbol to P{-i only if there are A,Y,a, 0 such that 

A-^YaXf3 G P. 

A communication savings table for grammar G 5 is shown in figure 13.6. 



1 *det \ *n \ *v \ *prep \ NP 


PP 


VP \ s \ 


1 r+ r-T --T'^ 


1 + 1 


+ 1-1 



(-h in entry X means: Pi passes symbols {i,X,j) to Pi-i) 
Fig. 13.6. Communication savings table I for G 5 



A second, somewhat more involved communication savings scheme is the 
following. Each process Pi has its “own” terminal ai^i . Is it possible, knowing 
the marked terminal {i,ai^i,i -h 1 ), to discard symbols {i -f l,X,j) that 
arrive from Pi+i? Evidently, {i 4 - 1 , can only contribute to a parse if 
X G FOLLOw(ai+i). If X cannot logically follow ai^i then the marked symbol 
can be discarded. An example of this is (3, NP, 4). An NP cannot follow *det, 
but P 3 has no way of knowing that this is indeed the case. So the NP is sent 
on to F 25 which is able to determine that (3,iVP,4) is indeed useless. 

A more subtle filtering scheme is possible, however. As an example, con- 
sider the marked symbol ( 6 , *n,7) that is received by P 5 . This is clearly a 
useful symbol; *n G FOLLOw(^de^) and it is used to construct (5, A'P,?). 
But we will argue that it can not be used by Pq, . . . , P 4 and hence need not 
be sent on. A close inspection of the parsing table in Figure 13.2 shows that 
some *n can be used only if a process has an immediately preceding *det on 
its stack. As (5, *det^6) is not sent on to Pi, by communication savings rule 
I, there is no way in which any process down the pipeline could do anything 
useful with ( 6 , *n,7). In general, if Pi owns terminal (z,a ,2 -h 1 ), a symbol 
4- 1, X,j) needs to be passed on if the combination aX appears somewhere 
but not at the beginning a left-hand side, or else if a combination AX ap- 
pears in the right-hand side of a production and A produces a string ending 
with a. More formally: 

• communication savings rule II: 

Pi, having recognized a terminal symbol (z, a, z 4 -l), writes a marked symbol 
{i -h l,X,j) to Pi-i only if one of the following cases applies: 

(z) there are B,y, a, ^ such that G P; 

(zz) there are B,A,X,a,j3 such that B-^aAX0 G P and a G Last(A).^ 

Communication savings table II for grammar G 5 is shown in Figure 13.7. See 
Lankhorst and Sikkel [1991] for an algorithm that computes communication 
savings table II for an arbitrary grammar. 

^ Last is the mirror image of First, cf. Section 12.1. 
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a\X 


*det 


*n 


it 

*v 


*prep 


NP 


PP 


VP 


s 


*det 


— 


— 


— 


— 


— 


— 


— 


- 


*n 


— 


— 


— 


— 


— 


-h 


+ 


— 


*v 


- 


- 


- 


- 


- 


- 


- 


- 


*prep 


- 


- 


- 


- 


- 


— 


— 


— 



(H- in entry [a,X] means: if ai+i = a then Pi passes {i + to Pi-\) 

Fig. 13.7. Communication savings table II for 



It is possible to define grammars in which some junk will slip through the 
mazes of our two filters and more sophisticated filtering mechanisms would 
provide smaller optimizations. Consider, for example, a grammar 

■[iS — ^dhD ^ S — yccDj D — yd, D — ycd} 

and an input string abed. Then P 2 , owning a terminal (2,c, 3), will pass 
(3,i?,4) that satisfies communication rules I and 11(0. this case Pi could 
detect, when it is supplied with enough sophistication, that (3,P,4) is no 
longer useful. We conjecture, however, that adding such sophistication will 
only be detrimental to the average-case efficiency of the algorithm; weird 
constructions like this are unlikely to appear in natural language grammars. 



13.4 The construction of a distributed parse list 

The PBT parser can be easily extended with the computation of a packed 
shared forest, represented by a parse list. Each process computes its own part 
of the parse list. That is, the output of P{ contains all entries in the parse list 
with left position marker i. We need to make a single technical adjustment, 
however. Entries in the parse list of Pi may contain pointers to entries in other 
parts of the distributed parse list. To that end we tag such pointers onto the 
symbols that are passed down the pipeline. The left position marker z of a 
symbol is annotated with its local label in the parse list. Marked symbols 
now have the format (z.A:, X, j), where k indicates the k-th entry in the parse 
list of Pi. The combination of left place marker and local label provides a 
unique reference across the different partial parse lists. In Figure 13.8 a parse 
list for the example sentence is shown. 

The parse forest is not identical to the one produced by Tomita’s algo- 
rithm. The nodes in our parse forest satisfy the following specification: 

• a node (z, X,j) is contained in the forest if and only iff X=>*ai^i .. .aj. 

The PBT forest contains more nodes that are not reachable from the root, 
because the top-down filtering has been discarded. On the other hand, if 
X=>*di^i . . . Uj, then it is guaranteed that the PBT forest contains a unique 
node (i^X^j) (possibly containing multiple sub-nodes). In Tomita’s algo- 
rithm, a symbol that spans some specific part of the sentence is usually rep- 
resented by a single node. Sharing may fail, however, when identical symbol 
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symbol 


children 


(6.1, *n,7) 
{6.2, NP, 7) 


(6.1) 


(5.1, *det,6) 
{6.2, NP, 7) 


(5.1, 6.1) 


(4.1, *prep,5} 
{4.2, PP, 5) 


(4.1, 5.2) 


(3.1, *n,4) 
(3.2, ATP, 4) 
{3.3, NP, 7) 


(3.1) 

(3.2, 4.2) 


(2.1, *det,3) 
(2.2, iVP,4) 
{2.3, NP, 7) 


(2.1, 3.1) 
(2.2, 4.2) 


(1.1, *v,2) 
(1.2, VP, 4) 
(1.3, VP, 7) 


(1.1, 2.2) 

(1.1, 2.3) (1.2, 4.2) 


(0.1, *n,l) 
(0.2, NP, 1) 
(0.3, 5, 4) 
(0.4, 5, 7) 


(0.1) 

(0.2, 1.2) 
(0.2, 1.3) 



Fig. 13.8. The parse list, root node is 
0.4 



vertices on the stack are followed by different state vertices. Hence an ex- 
act specification of Tomita’s parse forest is very complicated (in fact Tomita 
doesn’t give one), as it depends on the idiosyncrasies of the particular LR 
parsing table. 

A more substantial improvement upon Tomita’s algorithm, from a the- 
oretical perspective, is that PBT runs on arbitrary context-free grammars. 
Consider, again, the hidden left-recursive grammar 

— yA.Sb^ S — ycLj A— 

that was used as a counterexample in Section 12.6. Tomita’s algorithm, an- 
ticipating an arbitrary number of 6’s, creates infinitely many A’s for a start. 
The infinite series of reductions is driven by 

closure{{S-¥AmSb}) = {5-4A.56, S—^*ASb}. 

When the parser gets into this state, with look-ahead a, it will e-reduce an 
A and move on to the same state. PBT, in contrast, will only reduce a single 
(0, A, 0). There is no cycle in the parsing table because the closure function 
was not used in its construction. In Figure 13.9 the graph-structured PBT 
stack of Po is shown for the sentence ab. The parse list is given in shown in 
Figure 13.10. 

Cyclic grammars are also parsed in a natural way, without the need for 
extra sophistication. Consider the grammar {5— >5, 5-4a}, and the sentence 
a. When (0,5,1) is recognized, it is reduced to (0,5,1), which is already 
present, and need not be added again. Thus the parser will add the corre- 
sponding node as a sub-node to itself. The complete parse list is shown in 
Figure 13.12. The parse forest is drawn as a graph in Figure 13.11. 
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Fig. 13.9. Parse stack of Po for the sentence ab 



symbol 


children 


(2.1, A, 2> 


0 


(1.1,A,1> 
(1.2, 6,2} 


0 


(0.1, A, 0> 
(0.2, a, 1) 


0 


(0.3, 5,1) 


(0.2) 


(0.4, 5, 2) 


(0.1, 0.3, 1.2) 



Fig. 13.10. Parse list of Po for the sentence ab 



d 




Fig. 13.11. Parse forest for a, 
G = {S->5, S->a} 



symbol 


children 


(0.1, a, 1) 
(0.2, 5,1) 


(0.2), (0.1) 



Fig. 13.12. The parse list for a, 
G = {S — S — 



Rekers, in Chapter 1 of his Ph.D Thesis [1991], discusses how optimal node 
sharing and parsing of arbitrary context-free grammars can be obtained. In 
PBT these features come about naturally. 
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13.5 A formal definition of the PBT algorithm 

The following formal description of PBT is based on Lankhorst [1991]. It is 
in a style similar to the formal description of Tomita’s algorithm in Section 
12.5. 

It is useful, perhaps, to remind the reader that the direction of the edges 
is from the top of the stack to the bottom (i.e., in all figures, from right to 
left). 

In the formal description we use the following functions and global vari- 
ables: 

Fi'. graph-structured stack in processor Pi. This is a directed, acyclic graph 
with a single leaf node, uo, labelled with state number sq. T is initialized 
in PARSE and altered in SHIFTER. 

Ti'. shared packed forest in processor Pi. This is a directed graph (Vi,Ei) 
in which each vertex v E Vi may have more than one successor list 
(u,L) E Ei. Initialized in parse and altered in reducer, e-reducer, 
and SHIFTER. 

rf. the result returned by processor Pi. This is a set of vertices of Ti which 
form the roots of the parse forest in Pi. vq contains the global result. 
Initialized in PARSE and altered in SHIFTER. 

Ui^k' set of vertices of il, for which the following property holds: 

u E Ui^k => some partial parse of the substring ai-^i . . . of the input 
string (produced by Pi) is contained in the portion of the stack following 
u. 

Initialized in parseword and altered in shifter. 

A: subset of “active” vertices of Ui^k on which reductions and shift actions 

can be carried out. A is initialized in PARSEWORD and altered in ACTOR 
and SHIFTER. 

R: set of edges to be reduced. Each element is a triple {u, x,p) with v E Ui^k, 

X E SUCCESSORS(u) and p a non-empty production of G. (v,x,p) E R 
means that reduce p is to be applied on the path starting with the edge 
from V to X. REDUCER will take care of it. R is initialized in PARSEWORD 
and altered in ACTOR, REDUCER, and SHIFTER. 

Re ', set of vertices on which an e-reduction is to be carried out. Each element 
is a pair (u,p) with v E Ui^k and p and e-production. (u,p) E Re means 
that reduce p is to be applied on the vertex v. E- reducer will carry 
out this reduction. Re is initialized in PARSEWORD and altered in ACTOR 
and E-REDUCER. 

Q: set of vertices to be shifted on. If (j,X^k) is to be shifted, Q is defined 

as follows: 

Q = {(u, s) I u e F A s = GOTO(state(u), sym) ^ {error, accept}}. 

In this definition, V C Uij is a set of vertices on which a shift action 
may be carried out. (u, s) E Q means that shift s is to be carried out 
on V. SHIFTER will take care of this. Q is local to shifter. 
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S: contains the symbols (j, X,j) which have so far been read by processor 

P{. When a symbol (k,X^l) {I > j) is read from the pipeline, the ele- 
ments of S are written to the pipeline and S is emptied. S is initialized 
and altered in parseword and used in shifter. 

left(p): left-hand side of production p. 

\p\ : length of the right-hand side of production p. 

STATE (u): takes a vertex in Fi as its argument and returns the state label of 
this vertex. 

SYMBOl(x): takes a vertex in Fi as its argument and returns the symbol label 
of this vertex. This label is a link to a vertex in T{. 

SUCCESSORS(u): takes a vertex in Fi as its argument and returns the set of 
all vertices x in Fi such that there is an edge from v to x. 

GOTO{s, A): looks up the goto table and returns a state number, 
s is a state number and A is a grammar symbol. 

ACTION (s): looks up the action table and returns a set of actions, 
s is a state number. 

addsubnode(u, L): takes a vertex v in Ti and a successor list L as arguments 
and adds (u,L) to Ei in Ti = {Vi,Ei). 

BUFFER((z.m, A, j)): buffers a symbol {i.m^A,j) in a first-in first-out buffer. 
When a read action is executed, this buffer is read, and only if it is 
empty a symbol is read directly from the incoming pipe. 

READ(( 2 .m, A, j)): reads a symbol {i.m^A^j) from the buffer or incoming 
pipe. 

WRiTE((z.m, A, j)): writes a symbol (i.rrijA^j) into the outgoing pipe. 

PUSH((z.m, .A, j)): pushes a symbol {i,m,A,j) back into the incoming pipe. 

Sii state i of the parsing table, consisting of a set of dotted rules. 

Qi^x ’’ state to go to from state Si on symbol X, defined as 
{A — ^0iX*(3 I A — vot»X (5 G 

The parser is defined by the following set of procedures 

procedure parse(G, ai . . . an) 

begin 

for z := 0 to n in parallel 

do Fi := 0; Ti := 0; Vi := 0; 

create in Fi a vertex vq labelled 5 q; 

PARSEWORD (z) 

od; 

return ro, the set of roots of the parse forest 

end PARSE; 
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procedure parseword(z) 
begin 

k:=i\ Ui,k ■■= {vo}-, A:-Uix, 

R 0; Re := 0; S ~ 0; 
previous := 0; 

INPUT((i,ai+i,i + 1)); 

create in T{ a node m labelled a^+i; 

PUSH((2.m,ai4-i,2 + 1)); 
repeat 

while A / 0 do ACTOR od; 
while ii ^ 0 do REDUCER od; 
if iie 7^ 0 then E-REDUCER fi; 

READ((/ir5i./a6e/, sym, last)); 
if last ^ previous 

then for all (j.l,X,j) G S do WRlTE((j./, X, j)) od £i; 

5 := 0 ; 

previous := last; 
if first = last 

then 5 := 5 U {{first. label, sym, last)} 
else WRITE sym, last)) 

fi; 

SHiFTER((yir5^^a6e/, sym, last), Ui^first)\ 
k last; 



until sym = $ 
end parseword; 



procedure ACTOR 
begin 

remove one element v from A; 
for all a G ACTION (STATE (t>)) 
do if a = reduce p and p is not an e-production 
then for all x G sucCESSORs(u) 
do R := R\J {{v,x,p)} od 
elseif a = reduce p and p is an e-production 
then Re := Re U {{v,p)} 
fi 



od 

end ACTOR; 
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procedure REDUCER 
begin 

remove one element from R; 

N := LEFT(p); 

for all y such that there exists a path of length 2\p\ — 2 from x to y 
do L (symbol(zi), . . . , SYMBOL( 2 ;|p|)), where 

zi=xi, z^pI = y and Z 2 , . . . ,2|p|_i are 
symbol vertices in the path from x to y; 
for all s such that 

3w{w e SUCCESSORS( 2 /) A GOTO(STATE(t(;), A^) = s) 
do W := {w \ w G SUCCESSORS(2/) 

A goto(state(i(;), A") = s}; 
if 3u{u e Ui^k A STATE(lfc) = s) 
then if there is an edge from u to a vertex z 

such that SUCCESSORS(z) W 
then addsubnode(symbol(z),L) 
else if Ti does not contain 

a node m labelled N 

then create in T{ a node m labelled N £i; 
ADDSUBNODE(m, L); 

BUFFER((i.m, A:)) 

else if Ti does not contain a node m labelled N 
then create in T a node m labelled N fi; 
ADDSUBNODE(m,L); 

BUFFER((i.m, A', k)); 

fi 

od 

od 

end REDUCER; 

procedure E-REDUCER (* will only be called ii k = i *) 
begin 

for all (u,p) G Re 
do N := left(p); 

create in T{ a node m labelled A; 

ADDSUBNODE(m, NIL); 

BUFFER((i.m, A, k)) 

od; 

Re := 0 ; 

end E-REDUCER; 
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procedure SHlFTER((^r5^./a6e/, sym, last), V) 

begin 

Ti Vi U {SYMBOL(m) I m E SUCCESSORS(?;) ^v ev f\ 

GOTO(STATE(t;), 5?/m) = accept}', 

Q {(^?^) I ^ G y A 5 = GOTO(state(?;), sym) ^ {error, accept}}', 

W := 0; 

for all s such that 3v((v,s) E Q) 
do if 3w E Uijast A STATE(ty) == s) 

then create in Fi a vertex x labelled first. label; 
create in Fi and edge from it; to a: ; 
for all V such that (v,s) G Q 
do create in Fi and edge from x to v od; 
if It; ^ A 

then for all q such that reduce q E ACTION (5) 

and q is not an e-production 
do ii U {{w, x, q)} od 

fi 

else create in Fi two vertices w and x labelled 

5 and first. label, respectively; 
create in an edge from w to x; 
for all V such that {v, s) G Q 
do create in Fi an edge from x to v od; 

Uijast Uijast bJ {u;}; 

A := AU {t/;}; 

W :=WU{w}; 
fi 

od; 

then for all {last. m, X, last) E S 

do smFTER{{last.m,X,last),W) od 

fi 

end SHIFTER; 
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13.6 Empirical results 

The PBT algorithm has been tested in a series of experiments in which par- 
allel execution was simulated on a single workstation, In this way we could 
experiment with an arbitrary number of (simulated) processors. 

The simulation set-up is as follows. Each (virtual) process is run consec- 
utively. The stream of symbols is stored internally, rather than written to a 
pipe. When the next virtual process is started, the clock is reset. For every 
(simulated) read and write an extra processing time of 1 ms is counted. Each 
symbol that is sent from one virtual process to another is timestamped. When 
a process receives a symbol with a time stamp later than its own time, the 
clock is updated and the waiting time accounted for. 

We^ implemented PBT in the language C and re-implemented Tomita’s 
algorithm so as to ensure compatibility. We have not attempted to optimize 
run-time efficiency at the expense of straightforwardness. The timing experi- 
ments have been conducted on a Commodore Amiga because of its accurate 
timing capabilities. 

The grammars and example sentences are the ones given by Tomita [1985]. 
Grammar I is the example grammar G 5 . Grammars II, III and IV have 42, 
223 and 386 rules, respectively. Sentence set A contains 40 sentences, taken 
from actual publications, as listed in the appendix of [Tomita, 1985]. Set B 
is constructed as *n*v*det*n{*prep*det*n)^~^ with k ranging from 1 to 13. 
In Figures 13.13 and 13.14 the timing results for set B and grammars III and 
IV are plotted on a double logarithmic scale. These figures show that gain 
in speed due to parallelization outweighs the additional communication over- 
head only if a sentence is sufficiently long. An exact break-even point cannot 
be given, as it depends on the grammar, the sentence, the characteristics of 
the parallel architecture and the implementation. 

Similarly, Figure 13.14 shows that the extra overhead for filtering pays 
off only if the sentence is not too small. We could tip the balance somewhat 
more in favour of PBT by improving the filter. In the program that was used 
to produce these plots, the filter has a computational complexity linear in 
the size of the grammar. In retrospect, this could have been handled rather 
more efficiently. Adding sophistication to handling the graph structured stack 
and parsing table look-up could improve the performance in absolute terms; 
relatively it would make less difference, however, as all programs would benefit 
from it. 

Testing sentence set A produces plots of a more varied nature, as sentences 
of comparable length may differ a lot in complexity. Using linear regression 
analysis, we found the overall trend to be similar to the results for set B. A 
series of other plots can be found in [Lankhorst and Sikkel, 1991]. 



This work was done by Marc Lankhorst. 
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IV 


Tomita 


O(n'-'i) 




PBT, unfiltered 


0(ni«2) 


0(n2i®) 


PBT, with filtering 




0(ni»®) 



Fig. 13.15. Estimated asymp- 
totic complexity for set B 



The complexity of a parsing algorithm can be measured as a function of 
the length of the input sentence. For formal languages this makes sense, as 
strings (i.e., computer programs) can be very long indeed. For natural lan- 
guages this is a rather doubtful measure. The size of the grammar, usually 
much larger than the average sentence, is constant and therefore considered 
irrelevant. Moreover, constant factors as discussed above are abstracted from. 
Nevertheless, sentence set B shows the complexity of the algorithms rather 
nicely, because of the combinatorial explosion of PP attachment ambiguities. 
For set B and grammars III and IV we estimated the asymptotic complexity. 
These figures, for what they are worth, are shown in Figure 13.15. Similar 
computations for sentence set A confirm the trend that the complexity of 
PBT, using n parallel processes, is roughly 0{^/n) better than Tornita’s al- 
gorithm. Hence, waiting time and uneven load balancing accounts for a factor 
0{y/n) as well. See [Lankhorst and Sikkel, 1991], again, for all the details. 

Finally, we have estimated the speed of the PBT algorithm as a function 
of the number of processors. The 37 processes for the sentence 13 of set B 
have been allocated to any number of processors ranging from 1 to 37, with 
the processes evenly distributed over the processors. Let p be the number of 
processors, then there is natural number k such that A: < 37/p < A: -h 1. The 
higher ranked processes are grouped into clusters of A: -h 1, the lower ranked 
ones in clusters of k per processor. The results are shown in Figure 13.16. 
The decline is sharpest when incrementing p causes a decrease of A:, in which 
case the processor handling Pq,. . . ,Pk~i is relieved of one of its processes. 



13.7 Related approaches 

A Parallel LR parser that also uses a “bottom-up” approach to parallelization 
has been defined by Fischer [1975]. But the similarity to PBT is merely 
superficial. Fischer runs Synchronous Parsing Machines (SPM’s) on various 
parts of the sentence in parallel. An SPM tries to parse its part of the input 
until it hits upon the starting point of its successor and then its merges 
with its successor. The fundamental difference with PBT is that Fischer’s 
algorithm really merges parse stacks. PBT has separate parse stacks, but 
each processor may use nonterminals reduced by other processors as if they 
were terminal symbols. Moreover, Fischer’s approach is only defined for LR 
grammars and cannot easily be extended to GLR. 
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Fig. 13.16. Performance vs. number of processors 



Parallelization by allocating different branches of the stack, cf. [Tanaka 
and Numazaki, 1989], [Numazaki and Tanaka, 1990], [Thompson et ah, 1991], 
was already discussed in Section 13.1. 

Thompson [1989, 1994] obtains several parallel chart parsers by distribut- 
ing the chart. A parallel parser based on systolic matrix multiplication is also 
given in [Thompson, 1994]. 



13.8 Conclusion 

The Parallel Bottom-up Tomita parser has been developed as a cross- 
fertilization of Tomita’s algorithm with the bottom-up parallelization of Ear- 
ley’s algorithm. This could be accomplished rather straightforwardly because, 
in Chapter 12, we have shown that the algorithms of Tomita and Earley have 
underlying parsing schemata that are almost identical. 

The parallelization does not offer a tremendous speed-up, but we never- 
theless we see it as a moderate success. Experimental results show a reduction 
of the complexity in terms of the length of the sentence (for a few example 
grammars, not in the worst case) of a factor 0{y/n) by using n processors. 
The remaining 0{y/n) is spent on the slightly more complicated structure of 
the parser, communication, and uneven load balancing. We have shown that 
parallel parsing is feasible. 
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A spin-ofF effect of PBT is that the parsing table is constructed in linear 
time. Construction of LR parsing tables for large grammars is very costly. 
Hence, a PBT parser, also in a sequential implementation, is an useful tool for 
development and debugging of grammars. Whenever the grammar is changed, 
a new parsing table can be constructed on the fly. 
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In the previous chapters we have discussed how parsing schemata can be 
instantiated to parsing algorithms of various kinds. Such algorithms can be 
coded into programming languages and then executed on a computer system. 

As a last application of the theory of parsing schemata we will look at 
the possibilities of coding schemata (or, to be precise, uninstantiated parsing 
systems) directly into hardware. Several connectionist approaches to pars- 
ing have been proposed, cf. Fanty, [1986], Selman and Hirst, [1987], Howells, 
[1988], Nakagawa and Mori, [1988], and Nijholt, [1990], in which a large num- 
ber of simple processing units are linked into a highly interconnected network. 
For an arbitrary parsing system we can define a boolean circuit, which is a 
particularly simple kind of connectionist network. 

Because of the massive parallelism involved, connectionist implementa- 
tions of parsers can be really fast. This might be of interest for real-time 
systems. Furthermore, it has been argued that it is possible to integrate such 
a connectionist syntactic parser with semantic and pragmatic analysis (cf., 
e.g., [Waltz and Pollack, 1988], [Cottrell, 1989]). We will not further go into 
these aspects, and concentrate on syntactic analysis. 

In order to investigate how fast parsing can be done in principle, we push 
parallelism to the limit and investigate logarithmic-time boolean circuits. 
We obtain complexity bounds that conform to those known for fast parallel 
algorithms on parallel random access machines. This result is of theoretical, 
rather than practical value, however, because the number of processing units 
and the connectivity is unrealistically high. 

This chapter is almost self-contained. Some basic understanding of parsing 
systems and schemata is needed and there are some references to examples 
in Chapters 4-6. The general idea of logarithmic-time parsing, for the sake of 
simplicity exemplified by binary branching grammars, is explained in detail. 

A short recapitulation and some additional concepts specific to this chap- 
ter are given in Section 14.1. We will make a tiny change in the notation 
of parsing systems. Unlike the previous chapters, the focus is now on unin- 
stantiated parsing systems: a network is constructed that can parse arbitrary 
sentences according to some specific grammar. 

In 14.2 we present a recognizing network for binary branching grammars. 
In 14.3 this is extended to a parsing network that encodes a shared forest 
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for a given sentence. In 14.4 we filter irrelevant parts from the network and 
briefiy discuss how this network construction can be applied to arbitrary 
context-free grammars. 

A logarithmic-time parallel parsing algorithm, which is a slight modifi- 
cation of Rytter’s algorithm, is presented in Section 14.5. The fact that the 
algorithm is indeed logarithmic-time is proven in 14.6. A boolean circuit im- 
plementation is given in 14.7. In 14.8 we look at this problem from a more 
general perspective; Rytter’s algorithm can be seen as a specific instance of 
a more general notion of conditional parsing systems that can be used to 
fiatten trees of deduction steps. 

Related approaches are briefiy discussed in 14.9, conclusions follow in 
14.10. 

This chapter is based on a technical report [Sikkel, 1990], parts of which 
have been published in [Sikkel and Nijholt, 1991]. The presentation has been 
improved, however, by making use of parsing schemata. In particular the 
generalization to logarithmic-time boolean circuits for arbitrary grammars 
follows straightforwardly as a combination of different results. 



14.1 Preliminary concepts 

In this chapter, the emphasis is more on itninstantiated parsing systems than 
instantiated parsing systems and parsing schemata. We will give a definition 
of uninstantiated parsing systems that is slightly different (from Definition 
4.23), so as to allow these systems to be implemented in boolean circuits. 

The notational conventions for context-free grammars that were in- 
troduced in Section 3.1 apply throughout this chapter. We write G = 
{N, S, P, S) for a context-free grammar with terminals N, nonterminals U, 
productions P and start symbol 5. We write L{G) for the language generated 
by G, i.e., a\ , . .an G L{G) iff S=>*ai . . . On- We write for nonter- 

minal symbols; a, 5, . . . for terminal symbols; X, T, . . . for arbitrary symbols; 

. . . for arbitrary strings of symbols. Positions in the string ai . . . Un are 
denoted by 

An instantiated parsing system for some grammar G and an arbitrary 
string ai ... an is a triple P(ai . . . a„) = (T, i/, D) with X a set of items, H 
an initial set of items (also called hypotheses) and D a set of deduction steps 
that allow to derive new items from already known items. The hypotheses in 
H encode the sentence that is to be parsed. For a sentence ai . . .an we take 

= {[oi,0, 1], ..., [on,n-l,n], [$,n,n + l]}; (14.1) 

at the (n-fl)-st position we always add an end-of-sentence marker $. Note 
that different hypotheses [a, i - 1, i] and [6, z - 1, i] may occur if the z-th word 
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falls into different lexical categories. The hypotheses are always defined by 
(14.1). Deduction steps in D are of the form 

The items rji,. . . ,rjk G H UX are called the antecedents and the item ^ G X 
is called the consequent of a deduction step. If all antecedents of a deduction 
step are recognized by a parser, then the consequent should also be recog- 
nized. The set of valid items V(P(ai . . . On)) is the smallest subset of X that 
contains the consequents of those deduction steps that have only hypotheses 
and valid items as antecedents. 

Whether the hypotheses H are part of the item set X or outside X does 
not really matter. In previous chapters we have treated hypotheses as sepa- 
rate entities (i.e., H f\X = 0), simply because that was more convenient for 
specifying parsing systems. In this chapter we have strong reasons for chang- 
ing this convention. We will consider the items of the form [a,i — l,z] to be 
included in X} So we find, for any given string ai . . . an that [$,n,n -f 1] is 
the only hypothesis that is not included in X. 

An uninstantiatated parsing system specifies all objects and deduction 
steps that can be used to parse sentences according to some grammar G. 
We are interested in constructing parsers by means of boolean circuits. The 
construction of a parser cannot be dependent on any particular string, so we 
have to include all potential hypotheses for all strings. 

Definition 14.1. {(uninstantiated) parsing system) 

An uninstantiated parsing system for some grammar G is a triple {X,XL,D) 
with the set of potential hypotheses XL defined by 

n = {[a,i-l,i] I aGru{$} A i>l} (14.2) 

An (uninstantiated) parsing system can be instantiated for a particular string 
ai .. .On by selecting a set of actual hypotheses H C XI according to (14.1). 

□ 

In boolean circuits the remaining potential hypotheses XL\H will still be 
included in the system, but simply remain invalid. 

We write P for an uninstantiated parsing system, and P(ai . . . On) for an 
instantiated parsing system. A parsing schema P defines a parsing system 
P == P(G) for all G in some class of context-free grammars. 



^ The reason for this is that we want items of the form [a, i — i, ^] to be included 
in the set of valid items and the set of parsable items that will be introduced in 
in Section 14.3. When we present a set of valid items, e.g., in the form of a CYK 
recognition table, we usually do not include the end-of-sentence marker. Thus 
the Figures 14.1 and 14.3 cover exactly the set of valid resp. parsable items. 
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Definition 14.2. {binary branching grammar) 

A context-free grammar G is binary branching if all productions in P have 
the form 

We write BB for the set of binary branching context-free grammars. □ 

Binary branching grammars are strongly related to, but formally different 
from grammars in Chomsky Normal Form. The former have the advantage 
that CYK parsers are strictly binary as well. This will be of help when we 
convert linear-time parsing networks to logarithmic-time parsing networks; 
such networks are easiest to define on binary systems.^ 

Definition 14.3. {binary parsing system) 

An (uninstantiated) parsing system P = {I.H.D) is called binary if 
D C (7^uX)2 

that is, every deduction step has exactly 2 antecedents. □ 

Example 14.4. (CYKbb) 

As an example, we will define a slightly modified CYK parsing schema for 
binary branching grammars. For an arbitrary grammar G G BB we define a 
parsing system F = {IcYKbb.'H, DcYKbb) by 

^CYKbb = {[A,iJ] I Ae N AO <i M + K j}, 

U {[a,i — l,i] I aElJAi>l}, 

DcYKbb = {[A,2,j],[y,j,A:] h [A,i,k] | A-^XY G P], 

and P according to (14.2). 

The system can be instantiated by choosing a set of hypothesis if C H for a 
string ai . . . On according to (14.1). □ 

Example 14.5. 

As a more concrete example, we will look at the instantiated parsing system 

CYKbb(G 2 )(the flies like the marmelade). 

The grammar G 2 was defined (in Chapter 2) by the productions 

S NP VP \S PP 

NP *det *n I NP PP, 

VP -> NP, 

PP -> *prep NP, 

^ Generalization to parsing systems of arbitrary arity will follow later. So this is 
not an essential restriction on the types of grammars and languages that can be 
handled (binary branching grammars do not generate sentences of length 1 and 
0) but a temporary restriction to simplify the presentation. 
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Lexical categories of the relevant words are defined by 

*n -> flies I marmelade, 

*det -> the, 

*v flies I like, 

*prep like; 

but in our binary approach these are not considered to be part of the gram- 
mar. So we find a set of hypotheses 

[*rfet,0,l], [*n,l,2], [^u,l,2], [*«;,2,3], [*prep,2,3], 

[*det,3,4], [*n,4,5], [$,5,6]. 

The set of valid items can be represented in the usual upper triangular CYK 
recognition table. A symbol X is written into table entry Tij if [X,i^j] is 
valid. The table representing the valid items for the given grammar and 
sentence is shown in Figure 14.1. □ 
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Fig. 14.1. CYK recognition table for 
Example 14.5. 



In Chapters 3 and 4 we have enhanced parsing systems with a notion of 
correctness. For each sentence length n there is a set of final items C X. 
An item can be seen as the set of trees that conform to the properties specified 
by the item. A final item, then, can be seen as a set of parse trees. In a correct 
parsing system, a final item is valid if and only if it contains a parse tree for 
the given sentence. In the CYK case there is only a single final item for each 
n: we find = {[5', 0 , n]}. In general, there can be several final items. In 
an Earley- type parsing system (cf. Section 4.6), we have final items of the 
form [5->7.,0,n], as many as there are productions with left-hand side 5. 
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14.2 Recognizing networks 

Before we define boolean circuits, we will first augment parsing systems with 
a special item accept, that is valid if and only if ai...an € L{G). In a 
recognizing network, the well-formedness of the string that is being parsed 
ran be read off by inspecting the status of a special accept node. 

Definition 14.6. {augmented parsing system) 

For each (uninstantiated) parsing system {X,H,D) with 7i as in (14.2), an 
augmented parsing system P = {X,'H,D) is defined by 

X — XVJ {accept}, 

i) = D U [$,i,z + 1] \- accept | i > 0 A ^ 

with the set of final items for a string of length i. 

An instantiated parsing system {X,H,D), likewise, can be extended to an 
augmented instantiated parsing system {X, H, D). In that case, the extension 
to D will be ineffective for any i ^ n. □ 

Corollary 14.7. 

Let P by a correct parsing system (cf. Definitions 4.20 and 4.22) and P the 
augmented system of P. Then it holds for any ai . . . On € -F* that 

accept e V(P(ai . . . an)) if and only if ai ... an G L{G). □ 

We will now define a boolean circuit for an arbitrary parsing system and 
strings up to some maximum string length i, based on the connectionist net- 
work of Fanty [1985, 1986]. A boolean circuit can be seen as a directed graph, 
where the nodes are processing units and the edges are connections between 
these processing units. A node has a set of inputs (incoming connections from 
other nodes) and a set of outputs (outgoing connections to other nodes) . Each 
node can be in two states: activated (“on”) or not activated (“off”). If a node 
is “on” , it sends an “on” signal on all of its outputs; If a node is “off” , it sends 
an “off” signal on all of its outputs. The activation of a node is a function of 
the signals that it receives on its inputs. We will only use two different kinds 
of nodes. 

• An OR-node is “on” if it receives at least one “on” signal (and an arbitrary 
number of “off” signals). 

We will indicate OR-nodes by double parentheses (( )). 

• An AND-node is “on” if it does not receive any “off” signal (and an arbitrary 
number of “on” signals). 

We will indicate AND-nodes by double acute angular brackets C 

A parsing system has an infinite number of items and deduction steps. For 
implementation in a boolean circuit, however, it is required that the system 
be finite. We will obtain this by assuming a maximum string length i and 
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building a network that can handle all strings Oi . . . On with 0 < n < E. For 
any given £ we can define a restricted augmented system De), 

that comprises the part of P that is relevant for strings up to a length £. We 
will not take the trouble to give a formal definition of P^, for any particular 
system it will always be clear which items are in Xi and which are in X\Xi; 
similarly for Tli and D£. 

It is not necessarily true that restricting a system to a maximum sentence 
length makes it finite. Let G be a cyclic grammar, and T a tree-based parsing 
system for G. That is, every single tree is a separate item. Strings of finite 
length generate an infinite number of trees for cyclic grammars, hence a 
system that is restricted to some maximum string length will still be infinite. 
In an item-based system it is possible (but not necessarily the case) that this 
infinite number of trees is represented by a finite number of items. E.g. the 
Earley schemata in Examples 4.32 and 4.34, when applied to cyclic grammars, 
will yield finite restricted parsing systems. 



Example 14.8. 

Let G be a binary branching grammar. The parsing system P in the Example 
14.4 can be augmented to P and restricted to P^ as follows. The system 
p^ {Xi.XLt.Di) is defined by 

Xe = {[XJJ]eX \ j<£} U {accept}] 

= {[a,i — l,z] I aeX/\l<i<£] U {[$,z,z-fl] | 0 < z < ^}; 

= {[X,i,j],[Y,j,k] I- [A,i,k] I A-^XYeP}, 

= {[5, 0,i], + 1] I- accept}, 

Dt = 

Note that it is not necessary to define bounds on the position markers in De. 
All items in a deduction step, by definition, must be drawn from the item set 
of the hypotheses. That is, it holds by definition that Di C pfin{XiUXLi) x 2^. 

□ 



Definition 14.9. {recognizing network) 

Let P = {X^XL^D) be an arbitrary parsing system, and P^ the augmented 
system restricted to some maximum sentence length £. A recognizing network 
for P^ is a boolean circuit that has the following nodes: 

• an OR-node {{rj)) for each r] eX^U He] 

• an AND-node <?7i, . . . foi* ^^ch . . . , b ^ G De] 

and the following connections: 

• an edge {{rji)) — . . . , for each rji, . . . ,rjk ^ £ De 

and I <i < k; 
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• an edge <r/i, . . . , 7/^; — > {{Q for each m , . . . , T/fc h ^ 6 □ 

Initially, all nodes are “off”. It is assumed that the valid hypotheses are 
activated (and will remain to be activated) by external stimuli, derived from 
the “real” sentence. When this happens, a wave of activation will spread 
through the network. 

It is easiest to think of time as divided into discrete clock ticks. At ^ = 0, 
only the valid hypotheses are “on”. At ^ = z, for z > 0, the outputs of a node 
are determined as a function of the inputs at ^ = z - 1. Prom the set-up of 
the recognizing network it is clear that some “off” nodes will be turned “on” 
at some moment in time, but no “on” node will be turned “off” again. If the 
network is finite, it must become stable after a finite amount of time. 

An example of a tiny part of a network (after Fanty [1986]) is shown in 
Figure 14.2. Suppose that there is a production A-^BC^ then there are three 
AND-nodes for deduction steps that may activate a node (([A, 2, 8])) from valid 
pairs of nodes {{[B,2J])), (([C, j, 8])) for j = 4,5,6. Hence (([A, 2, 8])) will be 
activated if there is (at least) one pair of applicable B and C nodes where 
both nodes are “on”. 



[A, 2, 8] 




Fig. 14.2. A fraction 
of a recognizing net- 
work 



Theorem 14.10. {validity in a finite recognizing network) 

Let P be a parsing system and P^ for some maximum string length i be finite. 
Let ai . . . ttn with n < £he the input to the recognizing network according to 
Definition 14.9. Then the network will stabilize after a finite number of clock 
ticks. Moreover, A node ((^)) for ^ will be “on” in the stable network if 
and only if ^ € V(P(ai . . .an))- 

Proof: trivial. □ 

Consequently, the accept node will be activated if and only if ai . . . Un G L{G). 
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14.3 Parsing networks 

A recognizing network computes the correctness of a string. The accept node 
will be activated if and only if the presented string constitutes a valid sen- 
tence. Furthermore, each node that represents an item will be activated if 
and only if the item is valid. 

It is not possible to yield a set of parses as output of a boolean circuit 
(unless we add nodes that could represent all possible parse trees). But we 
can do better than offer only a set of valid items. We can make a distinction 
between 

• valid items that represent a partial parse tree for the given string, 

• valid items that do not represent any tree that is part of a parse tree for 
the given string. 

The former type of valid items will be called par sable items. 

The parsable items for Example 14.5 are shown in Figure 14.3. The item 
[PP,2,5] in Figure 14.1 has been deleted; it is valid, but not used in the 
context of the entire sentence. Similarly, the hypotheses that “flies” is a verb 
and “like” is a preposition are valid but not parsable for this sentence. 
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Fig. 14.3. C YK table with parsable items 
for Example 14.5 



In the sequel we will extend recognizing networks to parsing networks, 
that compute all parsable items for a given sentence. But first, we give a 
formal definition of parsability. 

Definition 14.11. {parsable items) 

Let P = (I, H,D) be an augmented parsing system, V = V(P(ai . . . Un)) 
the set of valid items for some string Ui ...Un. The set of parsable items 
W = W(P(ai . . . an)) is defined as the smallest set satisfying 

{i) if accept € V then accept G W; 

{ii) if ^ G >V and there axe rji, . . . ,rjk G V such that r/i , . . . , 77 ^ h ^ e D 
then T]i,...,T]k G W. 
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For an unaugmented parsing system P = (X, D) and a string ai . . . Un, an 
item ^ G X is called parsable if it is parsable in P for ai . . . Un- □ 

The following corollary can be employed for the local design of the net- 
work: 

Corollary 14.12. 

Let P = {X, W, X)) be an augmented parsing system. An item ^ accept in X 
is parsable for some string ai . . . Un if and only if there are Ci ? • • ♦ 5 Cfc > ^ ^ ^ 
such that 

(0 are valid, 

(n) 

{Hi) T] is parsable. □ 

Armed with Definition 14.11 and Corollary 14.12 we can now extend the rec- 
ognizing network to a parsing network. A node that represents an item in 
the recognizing network will be activated iff the item is valid. A supplemen- 
tary node in the parsing network will be activated iff the item is parsable. 
After accept has been turned “on” , a wave of activation spreads through the 
supplementary part of the network in reverse direction. 

Definition 14.13. {parsing network) 

Let P = (X, 'WjD) be an arbitrary parsing system, and P^ the augmented 
system restricted to some maximum sentence length £. A parsing network for 
P^ is a boolean circuit that consists of the following nodes: 

• OR-nodes ((77)) and ((Pry)) for each rj eXiUHe; 

• AND-nodes <771, . . . and . . . ,77ifc;^> 

for each 771 , . . . , h ^ G 



, 77fc b ^ G 

and I <i < k, 

,T]k^ C e Di, 

,rjk ^ e Di, 

,77fc h ^ G 
, T7fc b ^ G X)^ 

and 1 < 7 < fc. □ 

The supplementary V nodes are used to distinguish the parsable items from 
the valid nonparsable items. When these (and the connected edges) are 
deleted, the recognizing network of Definition 14.9 remains. An example of a 
fraction of a parsing network in shown in Figure 14.4. This is a simplification 
of the recognizing network of Fanty [1986].^ 

^ In Fanty ’s network, there is (in our notation) also an edge ((^)) — > ((X^O) 
every ^ E Xi. Moreover, a node {{V^)) is a special kind of “AND-OR-node” that 



and the following connections: 

• {^accept)} — > {{V accept)), 

• (im )) — for 771 ,... 

• — > {{0) for 771 ,... 

• <77i,...,77fc ;^> — > for 771 ,... 

• ((^ 0 ) — > for 771 ,... 

• <X^r7i,...,77A :;^> — > {{'PVi)) for 771 ,... 
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[A, 2, 8] P[A,2,8] 




Fig. 14.4. A fraction 
of a parsing network 



Theorem 14.14. {validity in a finite parsing network) 

Let F be a parsing system and F^ for some maximum string length £ be finite. 
Let ai . . . On with n < £ he the input to the parsing network according to 
Definition 14.13. Then the network will stabilize after a finite number of clock 
ticks. 

A node {{V^)) for ^ e Xe will be “on” in the stable network if and only if 
ee>V(F(ai...a„)). 

Furthermore, a node <^V ryi , . . . , 77^; will be “on” in the stable network if 
and only if {??i, •••,%, 0 ^ >V(P(ai . . .o„)). 



Proof: straightforward from the above discussion. □ 

In a CYK-like network (and in many other networks that are derived from 
sensible parsing schemata) we can see the activated V nodes a a represen- 
tation of a shared parse forest (in Chapters 12 and 13 also called packed 
shared parse forest). A parse node ((P[X, i,j])) will be activated if and 
only if X occurs in a parse of ai . . . as a constituent that spans the 
substring ...aj. Moreover, any pair of constituents Y,Z into which X 
can be decomposed can be found by inspecting the activity of the nodes 
<^V [Y, i, k], [Z, k, j]-, [X, 



ORs all signals coming from above and ands the result this with the signal from 
((^)). This extra edge can be deleted because any node <^V Cii • • • > Ck^rj^ that 
provides input “from above” to {{V^)) can be activated only if ((^)) has been ac- 
tivated. More importantly, the special type of node introduced by Fanty reduces 
to a conventional OR-node. Fanty’s original design - which was duly copied by 
Nijholt [1990] and Sikkel [1990] - is correct but unnecessarily complicated. 
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14.4 Some further issues 

In the previous sections we have defined the basics of boolean circuit imple- 
mentations of parsing schemata. There are some further issues, treated at 
length in [Sikkel, 1990], that can be dealt with rather tersely here. Most of it 
follows directly from results that have been covered elsewhere in this book. 

To start with, one can apply meta-parsing, as it has been called by Nij- 
holt [1990]. The parsing network according to Definition 14.13 may contain 
spurious nodes. If an item [A, i, j] is not parsable for any well-formed sentence, 
then it can just as well be deleted from the network. The network need only 
contain nodes for potentially parsable items. The necessarily unparsable items 
can be separated from the potentially parsable ones as follows. Let (2, H, D) 
be an uninstantiated parsing system, and D() the augmented system 

for some maximum string length We can instantiate the system by choosing 
H = 'Hi, that is, validating all hypotheses. If we run the (simulated) network, 
then a parse node {{V^)) will be activated if and only if there is some string 
ai . . .an with n < £ such that ^ E yV{fe{ai . . . an))^ 

If {{VQ is not activated by meta-parsing, then the item ^ can be deleted 
from the parsing system, and the nodes ((^)) and (('PO) can be deleted from 
the network. The same applies to any deduction step in which a necessarily 
unparsable item appears, either as antecedent or as consequent. 

The above meta-parsing algorithm takes for granted that we are are only 
interested in valid sentences. If a string is offered that is not contained in the 
language, we might be interested in finding at least those parts that can be 
recognized. To that end, we can employ a weaker meta-parsing algorithm that 
discards those items ^ for which ^ 0 V(F(ai . . , On)) for any string. The weak 
meta-parsing algorithm yields the regular subsystem that has been discussed 
in Section 4.5. 

The complexity of a boolean circuit parser is measured as follows. The size 
of the network is determined by the number of nodes. The total number of 
connections between nodes is linear in the size of the network, if the number 
of antecedents for any individual deduction step is limited by some small 
constant. This will be the case for all parsing systems that we discuss here.^ 
As the time complexity of a network we count the number of clock ticks that 
is needed to obtain the final, stabilized situation. 

Note that any individual network is finite. The size of the network is 
measured as a function of the maximum string length £ and the size |G| of 

A counterexample to this assumption is, for example, the GCYK parsing schema 
(cf. Example 5.20) applied to grammars with arbitrarily large right-hand sides 
of productions. This may yield systems where the number of connections is 
quadratic, rather than linear, in the number of nodes. 

So, to be formally correct, we should add connectivity as a complexity factor and 
in all applicable cases argue that the connectivity is of the same order as the size 
of the network. 
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the grammar (cf. equation (11.4) on page 239). Let fe = be a 

network, restricted to some maximum string length The size of the network, 
then, is simply 0{\Xi\ -f [Ht\ -f \Dt\). 

For a CYK network we find a time complexity of 0(n) and a net- 
work size of 0{\G\t^): the largest factor is the number of deduction steps 
[X,i,j],\Y,j,k] h [A,i,k] for each production A-^XY and arbitrary 
{)<i<j<k<L 

Fanty’s network is defined only for binary branching grammars. The same 
technique can be applied to define a boolean circuit parser for arbitrary 
context-free grammars. In Section 3 of [Sikkel, 1990], Fanty’s technique is ap- 
plied to construct a boolean circuit parser based on the algorithm of Chiang 
and Fu [1984] (cf. Example 6.20). A similar network (in fact a simpler one, 
see footnote 3 on page 320) is obtained by applying the network construction 
of Definition 14.13 to a parsing system ChF(G) for an arbitrary grammar 
G G CTg. 

Most parsing systems have a few initial deduction steps that have no 
antecedents. In a parsing network, these are mapped onto AND-nodes with 
no inputs. In the definition of AND-nodes we have anticipated this: an AND- 
node will be activated if none of its inputs is off. This is clearly the case for 
AND-nodes without input, hence all nodes of this type will be active at time 
t = 1. 



14.5 Rytter’s algorithm 

Further on in this chapter, in Section 14.7, we will define a boolean circuit 
implementation of Rytter’s algorithm. The recognition part of such a network 
follows directly from the validity of the Rytter parsing schema and the net- 
work construction of Definition 14.9. Extending the recognizing network to 
a parsing network can - in this particular case - be done rather more simple 
than with the construction of Definition 14.13. 

The more difficult issue is to prove that the network will stabilize in log- 
arithmic time. A formal proof that Rytter’s algorithm works in logarithmic 
time is given in Section 14.6. In this section we will introduce Rytter’s algo- 
rithm and provide the intuition on which the proof in 14.6 is based. 

As before, we only consider binary branching grammars. This restriction 
is not essential, but of great help to simplify the notation. In Section 14.8 we 
will briefly discuss how the approach can be generalized to parsing systems 
for arbitrary context-free grammars. 

The easiest way to explain Rytter’s algorithm is to start with the items 
that are used. CYK uses items of the form [X,i^j]. Such an item is valid if 
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Fig. 14.5. Different kinds of Rytter items 



A 




A 





X=>*ai+i . . . aj. Rytter’s algorithm, in addition, uses items [A,h,k\X^i,j]. 
Such an item is valid if 

A"^^ • • • diX(Ljj^\ . . . dfc . 

This can be seen as a CYK item with a gap; the missing part [X,i,j] still 
has to be filled, in order to obtain the validity of [A, h, k]. See Figure 14.5. 
We could also see such an item as a conditional CYK item: validity of [A, h, k; 
X, z,j] can be interpreted as 

if [X, z, j] is valid then [A,h,k] is also valid. 

Let A-^XY be a production in P. A CYK deduction step [X, z, j], [Y, j, A:] H 
[A,z, k] can be refined into two steps: 

[X,i,j] h [A,i,k;Y,j,k], 

[A,i,k;Y,j,k], [Y,j,k] h [A,i,k]. 
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The gap in the intermediate item is rightmost. Another possibility to deduce 
[A, i,j] is by means of an item with a leftmost gap [A, i, k; X, i,j]. 

Specific for Rytter’s algorithm is the addition of a simple combination 
rule: two conditional items can be combined into a single one if the “outside” 
of one item matches the “inside” of another item. A graphical impression of 
the different types of deduction steps in Rytter’s algorithm is shown in Figure 
14.6. 

A parsing schema for Rytter’s algorithm for binary branching grammars 
is defined as follows. This is a minor modification of the schema that was 
presented in Example 5.23. 

Schema 14.15. (Rbb) 

For an arbitrary binary branching grammar G E BB we define a parsing 
system fRbb = {T Rbb,^-, D ru) by 

= {[A,i,i] 1 ^ € Af AO < i Ai + 1 < j} 

U {[a, i — l,i] I a € A i > 1}, 

1(2) = {[A,h,k-,X,i,3] I [A,h,k]el A 

A h < i < j < k A [h ^ i OT j ^ k)} 

I Rbb = 

H ~ {[a,z — l,z] I a E X A z > 1}; U {[$,z,z -f 1] | ^ > 0}; 

= {{X,i,j]i-[A,i,k-,Y,j,k]\A^XYeP}, 

P(i') = {[Y,j,k]^[A,i,k-,X,i,j]\A^XY eP}, 

= {[A,h,k-,X,i,j],{X,i,j] b [A,h,k]} 

£)(2) = {[A,h,m-,B,i,l],[B,i,hX,j,k] h [A,h,m-,X,j,k]} 

DRbb = 

The system F Rbb can be augmented to ^ Rbb in the usual way, and restricted 
to a maximum sentence length i by considering only items [X, i^j] with j < ^ 
and [A, h, k; X, i,j] with k < £. □ 

Combining pairs of conditional items by is the key to logarithmic- 
time parsing. This will be shown by the following example. 

Example 14.16. 

Consider a grammar defined by 

S — ^ clS I Q>b 

and the string aaab. A parse tree for this string is shown in Figure 14.7. 

The CYK algorithm will need n steps to parse a string of length n, no matter 
how much parallelism is employed. A parallel Rytter parser will process a 
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O' b Fig. 14.7. A parse tree for aaab 



string aaab as follows. Prom each of the hypotheses [a, i — 1, z] we can obtain 
conditional items 

[a, 0,1] h [5, 0,4; 5, 1,4], 

[a, 1,2] h [5, 1,4; 5, 2, 4], 

[a, 2, 3] h [5, 2, 4; 6, 3, 4]. 

The antecedents of these three deduction steps, in combination with the 
hypothesis [6, 3, 4], can be combined into a final item by pairwise combination. 
We have 

[5, 0,4; 5, 1,4], [5, 1,4; 5, 2, 4] h [5, 0, 4; 5, 2,4], 

[5, 2, 4; 6, 3, 4], [6,3,4] h [5,2,4], 
and, subsequently, 

[5, 0,4; 5, 2, 4], [5,2,4] h [5,0,4]. 

It is clear that (for this grammar) a parallel Rytter parser will parse any 
sentence in logarithmic time. That this also holds for arbitrary grammars, 
remains to be proven. □ 

In order to deepen our understanding of what is going on here, we will look 
at an implementation of the above Rytter system for an arbitrary grammar 
on a parallel random access machine (PRAM). This is an often used abstract 
machine model for the definition of parallel algorithms. A PRAM consists of 
an (in principle unbounded) number of different processors that have access 
to a central shared memory. There are in fact various PRAM models, that 
differ according the possibilities for concurrent memory access. We will make 
use of a so-called WRAM: different processors may read the same memory 
location at the same time; concurrent writing into the same memory location 
is allowed only if these processors write the same value. 

Algorithm 14.17. {logarithmic- time recognizer for binary branching gram- 
mars) 

For the sake of simplicity we will only consider the recognition algorithm, and 
do not (yet) bother to determine a parse forest. We consider an instantiated 
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parsing system for some string ai ... an, so we can restrict the system to the 
actual string length n, rather than some arbitrary maximum string length 
We write ^ as a generic notation for CYK items in and as 

a generic notation for items in X^^\ If ^ = [A,h,k] and r] = [X,i,j] then 
(^^t7> = [A,h,k;X,iJ]. 

For each item ^ G X^^^ we introduce a boolean predicate recognized for 
each item {^<-t]) E we introduce a boolean predicate proposed {{^<-r))) . 
At the end of the algorithm, recognized{^) will be true iff ^ E V(P(ai . . .an)) 
and proposed {{^<r-T])) will be true iff E V(P(ai . . . an))- We define 

procedures initialize, propose, combine, and recognize as follows. 

procedure initialize 
begin 

for all ^ E X^^^ do recognized {Q := false od; 

for all E X^^^ do proposed {{^^rf}) := false od; 

for all ^ E JX do recognized {^) := true od 

end; 

procedure propose 
begin 

for all A-^XY E P and appropriate 0<i<j<k<n 
do if recognized {[X,i,j]) 

then proposed{[A, i, k] Y,j, A:]) true fi; 
if recognized{[Y,j, A:]) 
then proposed {[A, i, k; X,i,j]) := true fi 
od 

end; 

procedure combine 
begin 

for all (^■^Ty), (ry-f-C) € 

do if proposed {{^<r-Tj)) and proposed{{Tj<—()) 
then proposed := true fi 

od 

end; 

procedure recognize 
begin 

for all {^<r-T]) E X^^^ 

do if proposed{{^^T])) and recognized{r]) 
then recognized {^) := true fi 

od 

end; 

It is clear that each of the above procedures can be executed in constant 
time on a WRAM, given 0(n®) processors and O(n^) shared memory. With 
some more care the space complexity can be reduced to O(n^) (cf. Gibbons 
and Rytter, [1988]), but at the expense of some clarity. For our boolean 
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circuit implementation this is irrelevant; it does not use memory. A variant 
of Rytter’s algorithm can now be defined as follows.^ 

procedure Rytter^s algorithm {modified) 
begin 

initialize] 

propose] 

repeat f^logn] times 
begin 

recognize] 

propose] 

combine] 

combine 

end; 

if recognized {[S,^,n]) then accept else reject fi 

end; 

where |"^ logn] is the smallest natural number > ^logn. Hence, for example, 
5 steps suffice for any sentence of up to 32 words. For a sentence of 1000 words 
only 10 steps are needed (but at the cost of some 10^^ processors, which is 
not very realistic). □ 

For the CYK algorithm we used an upper triangular recognition matrix 
Tcyk- For Rytter’s algorithm we can use a similar recognition structure Tr, 
which is not a matrix but a pyramid. Table entries have three indices: the 
leftmost and rightmost position marker (as with CYK) and, thirdly, the size 
of an item. The size is the number of words in the string that is covered by 
an item. Formally: 

size{[X,i,j]) = j-i for any [X,i, j] € (14.3) 

size{[A,h,k] X^i,j]) = size{[A,h,k]) — size{[Xji,j]) (14.4) 

= h — k — i j 

for any [A, /i, k] X, z, j] G . 

A recognized item of the form [X, z, j] will be stored in table entry 
a proposed item of the form [A, /z, A;;X, z, j] will be stored in table entry 
Th^k,h-k-j^i> All items of size 1 will be stored in the bottom layer of the 

^ In the original version of Rytter’s algorithm the initialization consists of initialize 
only, a step comprises a call to propose^ combine^ combine^ recognize in that or- 
der. The reason to change this is that it allows introduction of loop invariants 
(14.5) and (14.6). Gibbons and Rytter employ a rather more complicated loop 
invariant, for which reason their proof is rather more cumbersome. 

The propose, combine, and recognize steps were called activate, square and peb- 
ble, originally. As the term activate had to be changed, so as to avoid confusion 
with activation of a node, we have also replace the other terms with words that 
seem more appropriate in this context. 
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Fig. 14.8. An example of a Rytter recognition table 



table; all items of size k in the fc-th (horizontal) layer. Note, furthermore, that 
all cubes Tij^k on the surface satisfy k = j — i. Hence, all proper CYK items 
will be stored in these surface cubes. Items [A,h,k;X,i,j] will be in a cube 
inside the table, that is located exactly j — i positions down from Th,k,k-h- 
Hence, in Figure 14.8 only the proper CYK items are visible; proposed items 
with a gap are hidden under the surface. If the hidden cubes are deleted, the 
conventional CYK table remains. 

The reason for constructing the pyramid-shaped table in Figure 14.8 is 
that it can be employed to visualize the logarithmic nature of Rytter’s al- 
gorithm. We can cut the table into slices, such that each slice will be filled 
by a single step of the algorithm. This is shown in Figure 14.9. In the above 
definition of Rytter’s algorithm it is in fact allowed that in step i items are 
recognized in some slice j > i. The algorithm can be improved by regarding 
in step i only those items that should go into slice i. But the important thing 
to notice, whether or not such a filter is applied, is that every valid item in 
slice i must have been recognized after i steps. 
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That is, the algorithm satisfies the following loop invariant statements: 



if ^ G V(P(ai . . . On)) and size(^) < 2^ 
then recognized {^) after k steps; 



if {^<r-Tj) 6 V(P(ai . . .On)) and size{{^^r])) < 2^ 
then proposed {{^i-rj)) after k steps. 

A proof will be given in the next section. 



(14.6) 



One may wonder why it is necessary to include two calls to combine within 
a single step. For the grammar in Example 14.16, a single combine per step 
will clearly be sufficient. That the second combine is necessary to guarantee 
the loop invariants (14.5) and (14.6) is shown by the following example. 

Example 14.18. 

Consider a grammar that has the productions 

5 SA I aa, 

A bB, 

B 55. 

We can define a series of trees n , . . . , by 
Ti = {S^aa), 

= (5 TkbrkTk), 

see Figure 14.10. It is easy to verify that, when only a single combine is 
executed per step, recognition of will take two more steps than recogni- 
tion of Tjb, while size(Tk+i) — Z.size{rk) 4- 1. But to stay within the desired 
complexity bounds, two more steps should be able to cope with a size mul- 
tiplication by 4. Hence, for large enough k this must fail. The reader may 
verify that yields a string of length 202. If only one combine operation per 

step were allowed, then it would take 9 steps to compute V(P(ai ... 0202 )), 
while log 202] =8. □ 



Tl 



/\ 



a a 



Tk+l 




Tk Tk 



Fig. 14.10. Recursive definition 
of tree Tk in Example 14.18 
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14.6 Correctness of Rytter’s algorithm 

The soundness of Rytter’s algorithm, as presented in the previous section, is 
trivially obtained from the fact that each of the procedures initialize^ propose^ 
combine^ and recognize is sound. The completeness of the algorithm follows 
from the loop invariants (14.5) and (14.6). The major task is to establish 
these loop invariants. 

The correctness proof that is given here may seem far from trivial. It 
should be noted, however, that it is rather more simple than the original 
proof of the “pebble game” by Gibbons and Rytter [1988]. Space complexity 
on a WRAM is irrelevant for our purpose of constructing a boolean circuit 
implementation. At the expense of O(n^), rather than O(n^) space complexity 
we have been able to introduce a simple loop invariant — and to simplify the 
presentation of the algorithm. The sliced pyramid in Figure 14.9 only applies 
to our version of Rytter’s algorithm, not to the original algorithm. 

We will introduce a few ad hoc concepts that are useful to simplify the 
proof. Firstly, for easy reference, the operations within some step k are num- 
bered as follows: 

k.l recognize 
k.2 propose 
k.3 combine 
k.4 combine 

Next, we assume that contains items of the form {^<^0 ^^7 

Such items have zero size (the item is nothing but a large gap) but will 
turn out to be practical as a boundary case. It is assumed that (^^0 ^ 
V(P(ai . . . On)) for any Moreover, it is assumed that proposed{{^<r-^)) has 
been set to true in the initialization phase. 

Furthermore, we replace the size function on items by a rank function 
that corresponds to the step number after which an item must have been 
recognized/proposed (if it is valid). We define 

rank{^) = k for < size(^) < 2^, 

rank{{^i-T])) = k for 2^“^ < size{{^i-rj)) < 2^, 

rank{{^<r-^)) — - 1 

We can apply the notion of rank also to binary trees. An item [X, i, j] can 
be seen as a collection of binary trees with j — i leaves. Hence the rank of a 
binary tree is the (rounded) logarithm of the size of its yield. Furthermore, 
the rank of a node in a tree is the rank of the sub-tree of which that node is 
the root. 

An important observation is that every binary tree of rank A: > 1 has a 
node of rank k such that both children of this node have rank < k. We call 
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this the critical node. In order to find the critical node, start searching at the 
top. If both children have rank < A:, then stop. Otherwise, go to the child 
with largest rank (which must be k) and continue searching from there. 

We will generalize this idea to items, and show the existence of a critical 
item. 

Definition 14.19. [critical item) 

Let ^ G rank{^) = k. An item 6 G is called critical to ^ if 

(z) (e^0)G V(F(ai...an)), 

[ii) rank[0) = k, 

[Hi) there are 77, C C V(P(ai . . . Un)) such that 

rank[rj) < A: - 1, rank[Q < k — 1, and 77, C ^cyk 0. 

(where 77, C ^cyk ^ is a convenient abbreviation for “there are A, X, Y, i,j, k 
such that 7] = [X, z, j], ( = [Y,j, k] and 6 = [A, z. A:] and, moreover, A^XY G 
P.”) □ 



Lemma 14.20. 

For every item ^ G V(P(ai , . . On)) with rank{^) > 1 there is an item 
0 G V(P(ai ... On)) such that 6 is critical to 

Proof. Let ^ G V(F(ai . . . On)) and rank[^) = k >1. 

Then there must be a pair of items 0, G V(P(ai . . . On)) such that 0, 0 ^cyk 
Without loss of generality, we assume rank[(j)) > rank['ip). 

If rank[(j)) < rank[^) then 6 = ^ and (z)-(m) in Definition 14.19 are satisfied. 
If ra7zA;(0) = rank[(f) = k we can recursively search for a 9 that is critical 
to 0. There must be a pair 0',0' G V(P(ai . . .On)) such that 0',0' ^cyk 0 
and rank[(j)') > rank['i!)'). In this way we find a sequence 0, 0',0",0"', and 
so on. Note, however, that rank[^) = rank[(j)) — rank[(j)') = ... but that 
size[C) > size[4>) > size[(t)') > ...; hence the recursion must end at some 
critical item. It is easy to verify that if 9 is critical to . . . , 0", 0', 0 then it is 
also critical to □ 

Corollary 14.21. 

Let (^<—9) G V(P(ai . . .a-n)), and 9 critical to 

Then rank((^i~9}) < rank(^). □ 

If proposed is true at some moment, then this must have been 
caused by a propose or by a combine operation. If it was a combine, then 
there is a 0 such that was obtained as a combination of previously 

proposed {^^9) and {9<r-r}). Each of these has been proposed either by a 
propose or by a combine operation, and so on. Ultimately, every proposed 
item with a gap can be broken down into a sequence of items with a gap, all 
fitting into each other, such that each item in this sequence has b6en proposed 
by a propose operation. This is formalized as follows. 
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Definition 14.22. {item path) 

Let € V(P(ai . . . an)). A sequence of valid items Co, • • • , Cp is called an 

item path from C to 77 if 

(0 Co = C and Cp = 77, 

(n) for each i with I < i < p there is some 6i G V(P(ai . . .an)) such that 

6i,Ci ^CYK Ci-l) 

(m) for each i and j with 0 < z < j < p it holds that (CiXj) ^ 
V(P(ai...an)). □ 



Lemma 14.23. 

for every (C^p) ^ V(P(ai . . . an)) there is an item path from C to p. 

Proof: direct from the above discussion. □ 

The reason for retrieving an item path is that, in the sequel, we will need 
to cut an item (C^p) oi rank k into pieces of rank < fc. To that end, we need 
one more auxiliary concept. A critical step on an item path is located such 
that both remaining parts, above and below the critical step, are of lower 
rank. 



Definition 14.24. {critical step on an item path) 

Let {^<r-r]) e V(P(ai . . . Un)) and rank{{^<-T])) = k > 0. Furthermore, let 
C = Co, • • • , Cp = ^ item path from C to p. An item (Ct-i , Q) is called a 
critical step of Co , • • • , Cp if 

{i) rank{{^<-Ci-i)) < k - I, 

(a) rank{{Ci,rj)) < A: - 1, □ 



Lemma 14.25. 

For every € V(P(ai . . . an)) there is a critical step on every item path 

from C to 77. 

Proof: trivial □ 



Having introduced all the necessary technical machinery, we can now 
prove the loop invariants. 

Lemma 14.26. ! 

Algorithm 14.17 satisfies the following statements for any k 

, . if C ^ V(P(ai . . . On)) and rank{^) < k 

^ * then proposed {^) after k steps; 

fll) ^ V(P(ai . . . an)) and rank{{^<-7])) < k 

^ ' then proposed {{^<-rj)) after k steps. 

These are reformulations of the loop invariants (14.5) and (14.6); an index k 
has been added for easy reference. 
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Proof . The correctness of (I)o, (II)o? (I)i are trivial. We will complete 
the proof by showing that the implication 

(II),_iA(I)fc => (II),A(I)fc+i (14.7) 

holds for any A: > 1. So we assume (Il)fe-i and (I)^. 

{ll)k : 

Let 6 V(P(ai ...a-n)) and rank{{^^7])) = k > 1. We will show 

that must have been proposed after step k. 

Let {(pi-ip) be a critical step on an item path from ^ to r/. 

Then there is some ip' £ V(P(ai ...an)) such that ip\ip \~cyk (p (cf. 
Definition 14.22 . (m)). 

Furthermore, rank{{^i-(p)) < k — 1, rank{{ip^r})) < k — 1 (cf. Lemma 
14.25), and, obviously, rank{ip') = rank{{(pi-ip)) < k. See Figure 14.11. 






(p 



rank{{^i-(p)) < k — 1 



rank{xp') < k xp' xp 

rank{{xp^T])) < k — I 

Fig. 14.11. A sketch of the proof of (II) fc 



Consequently, we find: 

after step A: — 1: proposed {{^^(p)) 
proposed {{ip i-rj)) 
after step A:.l: recognized {ip') 



(from (II)fc_i), 
(from (Il)fc-i); 
(from (I)fc); 



in step k.2 
in step A;. 3: 
in step A;. 4: 



V'' l-R 



where \-r indicates deduction by Rytter’s algorithm. 

Let ^ G V(P(ai . . . an)) and rank{^) = k 1 < 2. We will show that ^ 
must have been recognized after step A: -f 1. By Lemma 14.20 there is 
some 6 G V(P(ai . . . an)) critical to and there are r/, ( with rank < k 
such that T)X ^CYK 0 (cf. Definition 14.19). 

It must hold that {^i-9) G V(P(ai . . . an)). Note, furthermore, that 
rank{{^i-9)) < k. 
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We distinguish two cases: 0^6 and ^ = 6. First, we assume 0^0. 

Let {(jx—'ip) be a critical step on an item path from ^ to 6, similar to the 
above case. The situation is depicted in Figure 14.12. 

< k — 1 
(t> 




rank{il}') < k xp' xp 

rank{{xp<—9)) < k — 1 

e 




rank{Tj) < k V C rank{Q < k 
Fig. 14.12. A sketch of the proof of (I)fc-fi 



Consequently, we 


find: 




after step k — 


1: proposed {{^^(p)) 


(from (Il)fc-i), 




proposed {{xp ^6)) 


(from (II)*:-i); 


after step k.l: 


recognized {t]) 


(from (I)fc), 




recognized {Q 


(from (I)*), 




recognized {xp’) 


(from (I)fc); 


in step k.2: 






in step A:. 3: 


{^<-(f>),{(i>i-ip) t-R 










in step kA: 


^R 




in step (fc+l) 







Otherwise, if ^ it is clear that r/, ^ 6 will be .applied in step 

(fc+l).l. 

Thus we have finished the proof of implication (14.7). □ 

Theorem 14.27. {correctness Rytter’s algorithm) 

Algorithm 14.17 is correct. 

Proof. Soundness is straightforward from the definition of the algorithm; 
completeness has been proven in Lemma 14.26. □ 
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14.7 A parsing network for Rytter’s algorithm 

A parsing network for Rytter’s algorithm can be obtained by applying the 
network construction of Definition 14.13 to a parsing system according to 
Schema 14.15. A more subtle approach is possible, however, if we realize that 
it is only items of the form [X, i, j] that we are interested in. The items with 
a gap [A, /i, /c; X, z, j] have only been introduced as auxiliary constructs, so as 
to allow recognition in logarithmic time. If we restrict the notion of parsability 
to items without a gap, we can extend the recognition algorithm to a parsing 
algorithm somewhat more easily. 

As a direct consequence of the definition of parsability (cf. Definition 
14.11), we find that an item [X, z,j] is parsable, that is, 

[X, i,j] G W(P(ai . . . Un)), if and only if 

(0 [X,z,i]e V(P(ai...a„)), 

(ii) [5,0,n;X,i,j] 6 V(P(ai ...a„)). 

The loop invariant (14.6) guarantees that every valid item [S', 0,n; X,i,j] will 
have been proposed by the recognition algorithm. So the only thing we have 
to do for every recognized item is to check for an appropriate proposed item 
with a gap. 

Algorithm 14.28. {logarithmic- time parser for binary branching grammars) 
The recognition algorithm 14.17 is extended to a parsing algorithm by adding 
a procedure parse that needs to be called only once, after the repeat loop 
has finished. We add a boolean predicate parse for every ^ G This will 
be set to true if ^ G >V(P(ai . . .Un)). 

procedure parse 
begin 

for all ^ G X^^^ do parsed{^) false od; 
if recognized{[S^^^ n]) 
then par5cd([5, 0 , n]) := true’, 
for all [X,i,j] eX^^^ 
do if proposed {[S,0,n]X,i,j]) 
and recognized {[X,i,j]) 
then parsed{[X,i,j]) true fi 
od 
fi 

end; □ 

A parsing network for Rytter’s algorithm is defined as follows. 

Definition 14.29. {a parsing network for Rytter’s algorithm) 

Let P = (X,H,D) by a Rytter parsing system as in Schema 14.15, P^ the 
system restricted to some maximum sentence length £. A recognizing network 
for P^ is a boolean circuit that has the following nodes 
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(a) 


an OR-node ((accept)). 




(b) 


an AND-node <^accept^ 


for 0 <i < £, 


(c) 


an OR-node ((^)) 


for each ^ E XP U 'He, 


(d) 


an parse node 


for each ^ E xP , 


(e) 


an auxiliary node {{Q^)) 


for each ^ E \ 


(/) 


an OR-node {{{^<-r]))) 


for each E XP; 


(9) 


an AND-node <(^^77), 77^ 


for each {^e-t]) E X^p] 


(h) 


an AND-node <^(^^-77), {r]<r-Q^ 


for each {rjE-Q G xP 



and the following connections: 



for 7 /,C ^CYK ^ 

{OV Crj \~CYK 0 . 

for each {^<r-rj) G 
for each E lf\ 

for each (^^77) E 
for each {^E-rj), {rjE-C) E xf\ 
for each {q^O € lf\ 

for each {q^O € 

for 2 <i<i, 
for 2 <i < £, 
for 2 < z < £, 
for [S, 0 ,k;X,iJ]Elf\ 
for each ^ E U l-Lt: 
for each ^ E X^P UXU. □ 

The condition (^,77 \~cyk ^ in (i) is redundant, because antecedents of de- 
duction steps are not ordered. Note, furthermore, that (z) denotes unary 
deduction steps, hence there is no need for an intermediate AND-node. As 
a consequence, a propose needs only one clock tick, while all recognize and 
combine need two clock ticks each. 

Theorem 14.30. {boolean circuit implementation of Rytter’s algorithm) 

Let be an uninstantiated Rytter system according to Schema 14.15 for 
some grammar G E BB. Let a boolean circuit for some maximum string 
length £ be given by Definition 14.29. Then the following statements hold: 

• ((accept)) will be activated if and only if ai . . . a„ G L(G), 

• ((^)) will be activated if and only if ^ G V(P(ai . . . an)), 

• (((^^77))) will be activated if and only if {^<r-r]) E V(P(ai . . . an)), 



(0 Hv)) 

(ii) ii{^<-q))) — >■ 

(m) {{q)) — >■ 

(*«) — )■ (( 0 ) 

(v) (ii^^q))) — >• 

(vi) i{{q<-C))) — >• 

(w) <{^<r-q), 

{via) (([$, + 1])) — > -^accept, 

{ix) (([5,0,i])) — > <^accept,i:^ 

(x) -^accept, — > ((accept)) 

(xi) (([S,0,k-,X,i,j])) — ^ ((Q[XJ,j])) 

(xii) ((QO) 

(xiii) ((^)) — ^ 
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• will be activated if and only if ^ G W(P(ai . . . an)), 

• The network is in a stable state after 7[^ logn] 4-3 clock ticks. 

• The network has 0{\V\^£^) nodes and edges, with \V\ = |iVui7| the number 
of different grammar symbols. 

Proof: straightforward from Theorem 14.27 and Definition 14.29. □ 

The network that has been defined above contains a lot of nodes and 
edges that are not useful. Just like in the CYK case, useless parts can be 
removed by meta-parsing, where the network is started with all hypotheses 
in He validated. If remains inactive, then ((^)), ((QO)? 

can be deleted. Furthermore, if {{{^<—v))) remains inactive, then this can be 
deleted from the network as well. The AND-nodes that implement deduction 
steps can be trimmed accordingly. Note that we have defined parsability only 
for items not for items Hence if, say, ^ is valid but not parsable, a 

valid item may remain while ^ is being discarded. If we run the meta- 

parsing algorithm a second time on the already optimized network, some 
more items and corresponding AND-nodes and edges can be deleted. 

Two iterations of the meta-parsing algorithm suffice, a third iteration will 
not filter out any other nodes. 

The same result is obtained if the first iteration of meta-parsing only 
considers items of the form ^ and the second iteration only considers items 
of the form 

As in Section 14.4 we can employ a weaker meta-parsing algorithm if we 
would be interested to collect all valid items for (possibly incorrect) sentences. 
In that case, a single iteration of the meta-parsing algorithm suffices, in which 
invalidity, rather than unparsability is used as the criterion to discard nodes. 



14.8 Conditional parsing systems 

A Rytter system for a binary branching grammar has been obtained from 
a CYK system for binary branching grammars by adding conditional items 
and changing the deduction steps. This approach can be generalized to other 
parsing systems as well. In this way we can obtain logarithmic-time parallel 
parsing algorithms and boolean circuit implementations for arbitrary context- 
free grammars. 

We will define conditional parsing systems for arbitrary parsing systems. 
For the sake of simplicity, we will first do this for binary branching parsing 
systems, where every deduction step has exactly two antecedents. Afterwards 
we generalize this to deduction steps with any number of antecedents. The 
generalization is not difficult, but involves some more details that distract 
from the simplicity of the basic idea. 
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Definition 14 . 31 . {potential ancestor) 

Let F = {X,H,D) be a parsing system, ^ G X, The potential ancestors of ^ 
are inductively defined by 

(z) if ryi, . . . , r/fc b ^ G D then, for 1 < z < A:, rji is a potential ancestor of 
(zz) if C is a potential ancestor of ^ and ? 7 i , . . . , h G D then, for I < i < 

Tf]i is a potential ancestor of □ 

Definition 14 . 32 . {conditional binary branching parsing system) 

Let P = {X^H^D) be an (uninstantiated) binary branching parsing system, 
(cf. Definition 14.3). 

A conditional binary branching parsing system C = (X U J,T-L,V) is defined 
by 

J = I ^ 7 / is a potential ancestor of ^}; 

= {Ch (^^,7)1 77,chcer»}, 

= {vA^^v) I- ^}. 
r>(3) = (4<-C)}- 

V = U u □ 

The reader may verify that if P is a CYK system for a grammar G G BB, (cf. 
Example 14.4), then C is a Rytter system as defined by Schema 14.15. 

Next, we will consider the case that deduction steps have at most two 
antecedents. We may also have unary deduction steps of the form 77 h ^ or 
even 0-ary deduction steps h For unary deduction steps, we can distinguish 
between initial deduction steps, where the antecedent is a hypothesis, and 
non-initial deduction steps where the antecedent is not a hypothesis. The 
former type is hardly relevant, because these are only used in the first step for 
further initialization. The non-initial unary deduction steps need some special 
treatment if we are to retain the recursive doubling technique that changes 
linear-time parallel algorithms into logarithmic-time parallel algorithms. The 
general idea is quite simple: if 77 , C b ^ and ^ b* 0 , then we add a deduction 
step 77 ,C b 0. This technique has been applied by Graham, Harrison, and 
Ruzzo, for example, to obtain a more efficient Earley parser (cf. Example 
6.18). Note that ^ b* 0 does not necessarily imply that only unary deduction 
steps are applied. A deduction sequence ^ b . . . b 0 could include deduction 
steps of any arity. 
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Definition 14.33. {binary closure of deduction steps) 

Let P = (X, H, D) be a parsing system, where no deduction steps in D have 
more than 2 antecedents. The binary closure Z) of D is defined by 



D 



( 0 ) 



= |l-^ A 



^ i r? € H A 7j 0}, 



D 



= U U 



Where \~d and denote (transitive closure of) deduction steps in D. 

Note that all 0-ary and unary deduction steps in the binary closure are initial. 

□ 



Definition 14.34. {conditional parsing system) 

Let P = {X^H^D) be an (uninstantiated) parsing system, such that no de- 
duction step in D has more than 2 antecedents. 

A conditional parsing system C = {X\J J is defined by 

J = I A 7/ is a potential ancestor of ^}; 

p/mt _ 

= {Cl- (C^»7)| V,C^^eD}, 

^ C}, 

p(3) ^ I- (e^C)}, 

V = u U U □ 



Example 14.35. 

The algorithm of Chiang and Fu, laid down in the parsing schema ChF 
is a further small optimization of the GHR algorithm without top-down 
filtering. Like the Earley-algorithm, items [A—^a»l3,i,j] are recognized if 
a=>*ai-fi . . .aj. By making use of binary closure techniques, it is guaranteed 
that all items with position markers i,j can be computed simultaneously in 
one step, when all items of the form [A—^a»P,i,k] and [A->a.y3, /c, j] with 
i < k < j are known. See Example 6.20 on page 115 for the details. Let ^chF 
be a parsing system for any context-free grammar G E C!FQ. A conditional 
system CchF, according to definition 14.34, can be implemented in a boolean 
circuit in logarithmic time, similar to the Rytter case. A detailed treatment 
is given in [Sikkel, 1990]. □ 

Another example of a logarithmic-time algorithm for arbitrary context- 
free grammars is the “fast” version of the algorithm of de Vreught and Honig 
[1990, 1991] (cf. Chapter 6). 
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We have not yet covered parsing systems with ternary and higher order 
deduction steps. For a deduction step 

we can define conditional items 

(^ 4 - 771 ,..., 77 ^) for 1 < j < A: 
and deduction steps 

(^^771 ...,77j),77j h (^^77i...,77j_i). 

These are of little practical use. If a deduction system has deduction steps 
with 3 or more antecedents, it is more usual to apply a step refinement (cf. 
Chapter 5), to reduce these to binary deduction steps. In Example 5.20 we 
have defined the generalized CYK algorithm for arbitrary context-free gram- 
mars. Because of the arbitrary number of antecedents (corresponding to the 
length of the right-hand side of a production), the complexity of a GCYK 
parser can be arbitrarily large as well. The canonical way to tackle this is to 
refine GCYK into a bottom-up Earley parser, that scans right-hand sides of 
productions one at the time. The dotted productions A->a.0 that are used 
in Earley items constitute a bilinear cover of the grammar; cf. Leermakers 
[1989]. 

In some cases we have added antecedents to deduction steps to apply 
dynamic filtering; some antecedents are not used in the construction of the 
consequent, but encode conditions on the environment for applying deduction 
steps. Examples are the parsing schemata 6.12 and 6.13 for de Vreught and 
Honig’s algorithm and the context-free head-corner schema 11.10. In all of 
these cases only two antecedents are used for “constructing” the consequent, 
hence the recursive doubling technique can be used with proper adaptation. 
Further details are beyond the scope of this chapter. 



14.9 Related approaches 

A logarithmic-time algorithm that is almost identical to Rytter’s algorithm 
has in fact been published earlier by Brent and Goldschlager [1984]. 

Our version of Rytter’s algorithm is slightly different from the original 
[Rytter, 1985], [Gibbons and Rytter, 1988]. The advantage of our presenta- 
tion is twofold. On the one hand we have obtained an easy loop-invariant, 
that simplifies the presentation as well as the correctness proof. On the other 
hand, a boolean circuit implementation in 0(\ogn) time with 0 (n®) pro- 
cessing units can be obtained. While 0{n^) processing units is the minimum 
that is known for logarithmic-time parsing algorithms on a WRAM, it is 
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not self-evident that the same complexity bounds apply to a boolean circuit 
implementation . ® 

A logarithmic-time algorithm for arbitrary context-free grammars has 
been defined by De Vreught and Honig [1990, 1991]. This is a conditional 
variant of their algorithm that has been discussed extensively in Chapter 6. 
A conditional variant of the algorithm of Chiang and Fu [1984] is worked out 
in detail in [Sikkel, 1990]. 

Boolean circuits can be seen as a specific kind of neural networks. A neural 
network consists of a large number of simple processing units, that compute 
the output as a function of the input. Neurons can be “on” and “off”, as 
our nodes, but the function that is used to compute the state of a neuron is 
different. Typically, a sigmoid function over the weighted sum of the inputs 
determines the probability of activating a neuron. 

The main difference, however, between our connectionist implementations 
by means of boolean circuits and mainstream neural networks research is that 
of local vs. distributed representation. Characteristic for neural networks is 
the holistic representation of information. Concepts are represented by acti- 
vation patterns, rather than individual neurons [Rumelhart and McClelland, 
1986], [McClelland and Rumelhart, 1986]. A typical application of neural net- 
works is pattern recognition. When some input is offered to a network, it will 
stabilize in the state that represents the best fitting pattern from a set of 
patterns that the network has learned to recognize. 

In our approach, concepts are mapped to nodes: if an item is valid or 
parsable, one specific node will be activated to indicate this fact. This localist 
approach is also used in other parsing networks that are called “connectionist” 
or “neural”. Our linear-time parsing network is (a small improvement of) 
Fanty’s network [1985, 1986]; a generalization to Earley’s algorithm is given 
by Nijholt [1990]. Selman and Hirst [1987] describe a Boltzmann machine 
parser; Howells [1988] gives a relaxation algorithm that uses decay over time; 
Nakagawa and Mori [1988] present a parallel left-corner parser incorporated 
in a learning network. A neural network with distributed representation is 
used by Drossaers [1992] for recognition of regular languages. 

The inherent parallelism in connectionist networks offers possibilities to 
integrate syntactic processing with semantic processing and disambiguation. 
This has been studied, among others, by Waltz and Pollack [1988], Cottrell 
and Small [1984] and Cottrell [1989]. 



® There is a general method to convert algorithms on parallel random access ma- 
chines to boolean circuits, due to Stockmeyer and Viskhin [1984], but that would 
yield an implementation with 0(n^^) units in this specific case. 
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14.10 Conclusion 

Parsing schemata can be encoded straightforwardly into boolean circuits in 
such a way that valid items are represented by activated nodes. In this chapter 
we have added the notion parsability, that applies to those items that are not 
only valid but also have been effective in recognizing the sentence. The set 
of activated parse nodes in a boolean circuit gives an encoding of the shared 
parse forest of a sentence. 

These techniques are straightforward adaptations of a network design 
originally described by Fanty [1985]. We have shown that this can also be 
applied to logarithmic-time algorithms. Along the way, we have simplified 
both the presentation and the correctness proof of Rytter’s algorithm. 

In 14.8 we have shown that Rytter’s algorithm can be seen as a specific 
instance of a more general notion of conditional parsing systems that can be 
applied to other parsing schemata, so as to obtain logarithmic-time networks 
that parse arbitrary context-free grammars. 

The results for logarithmic-time boolean circuits have theoretical, rather 
than practical value, because the number of processing units that is required 
is unrealistically high. But it is an indication that boolean circuits provide a 
useful abstract machine model for parallel parsing algorithms. This observa- 
tion is not unimportant, because any (uninstantiated) parsing system can be 
trivially implemented as a boolean circuit. 
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Some of the material that has been presented is new, some other material has 
been included so as to make the book coherent and self-contained. Our specific 
research contributions are summarized in Section 15.2. But first we make 
some general remarks in 15.1, drawing together the results from the different 
topics that were discussed. Some ideas for future research are presented in 
15.3. 



15.1 Some general remarks 

Many different parsing algorithms can be found in the computer science and 
computational linguistics literature. Algorithms differ a lot with respect to 
languages in which they are expressed, data structures used, degree of formal- 
ity, class of grammars that can be handled, etc. Things get more complicated 
- and more varied - when we consider parallel, rather than sequential algo- 
rithms. 

A useful, if not necessary, starting point for comparing the relative merits 
of different parsing algorithms is a description of those algorithms in a com- 
mon formalism. In this book we propose parsing schemata as a framework 
for description and comparison of parsing algorithms. We have given numer- 
ous examples of parsing algorithms, both sequential and parallel, that can 
be described relatively straightforwardly within the parsing schemata frame- 
work. Moreover, we have also given some examples of cross- fertilization where 
properties of different algorithms can be mixed, once the correspondence in 
underlying structure has been uncovered. 

A second advantage of the use of parsing schemata is that it allows us to 
divide the parsing problem into two smaller, less complicated problems. Pars- 
ing schemata constitute a well-defined level of abstraction between grammars 
and algorithms. An implicit specification of the correct syntactic analyses of 
a sentence is given by the grammar; a parsing algorithm gives and explicit 
recipe for computing these. Parsing schemata define the steps that have to 
be taken, but without specifying data structures, control structures and com- 
munication structures. Efficient parsers may involve a lot of such details. By 
using a parsing schema as a high-level specification one can separate the issues 
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that relate to syntactic analysis from the issues that relate to the structure 
of efficient programs. 

The absence of algorithmic detail is both an asset and a liability. There 
is a gain in conceptual clarity. The essential properties of an algorithm are 
captured more easily, simply because a lot of detail is absent. The other side 
of the coin, however, is that is it not a priori clear whether a schema can be 
implemented efficiently. We have used parsing schemata most successfully for 
the description, reconstruction, optimization and cross-fertilization of existing 
algorithms, that were known to be efficient. The parsing schemata framework 
offers only limited insights into the efficiency of possible implementations. 

We have worked out a single concept, on a theoretical level in Part II and 
in a series of applications in Part III. It is this combination that, in our view, 
constitutes the main value of this book. 

Theory makes abstractions, and application requires detail. At some 
places, most prominently in Chapter 4, we have noted that these two in- 
terests can be at odds with each other. If our only concern had been to come 
up with an elegant framework of how parsing works in theory, we could in- 
terpret an item set as quotient over a congruence relation on a set of trees, 
and not worry about the practical details. If, on the other hand, our only 
purpose had been to provide a practical notation for conceptually clear de- 
scriptions of parsing algorithms, the underlying mathematics could simply 
be deleted. But without such a theoretical understanding, it could have been 
just a coincidence that sensible parsing schemata can be drawn up for all the 
algorithms that we have studied. Armed with a theoretical foundation, we 
can claim that the framework applies to constructive parsing in general. 

The theoretical foundation and practical applications reinforce each other, 
so that the value of the parsing schemata framework is more than the sum 
of both parts. 



15.2 Research contributions of this book 

A formalization of the notion of parsing schemata has been given in Chapters 
3 and 4. Various kinds of relations between parsing schemata were defined 
in Chapters 5 and 6. Refinement and generalization, in Chapter 5, are used 
to obtain qualitative improvements in schemata. By making smaller steps 
and producing more intermediate results, the complexity of some schemata 
can be reduced and/or the applicable class of grammars enlarged. Filtering, 
as discussed in Chapter 6, is used for quantitative improvements: irrelevant 
parts of a deduction system can be discarded. A hierarchy of filters could be 
expressed concisely and elegantly at the abstract level of parsing schemata. 
As an extensive example, we have presented various variants of Earley’s algo- 
rithm, de Vreught and Honig’s algorithm and the (generalized) LC algorithm 
within a single taxonomy. 
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In Chapter 8 we have extended parsing schemata to unification grammars 
by adding feature structures to a context-free backbone. As a result, we have 
obtained a formalism in which feature percolation in parsing algorithms for 
unification grammars can be defined explicitly in a simple way. 

The most interesting conclusion from Chapter 9 is that - even though 
context-free syntax is generally considered unimporant for unification gram- 
mars - context-free parsing techniques have a role to play in the construction 
of efficient unification grammar parsers. The efficiency of a parser can be 
enhanced by extracting a context-free backbone from a unification grammar 
that includes more than just a category feature. 

The major contribution of Chapters 10 and 11 is the specification, cor- 
rectness proof and complexity analysis of predictive Head-Corner parsing 
schemata. It was proven that despite the increase in administrative burden, 
the worst-case complexity of Head- Corner parsers is not worse than the com- 
plexity of conventional Earley parsers (provided that proper data structures 
are used). 

The Head-Corner parser for unification grammars that was presented here 
does not provide the last and ultimate truth about Head-Corner parsing of 
natural languages. Interesting work in this area is going on elsewhere. Our 
prime objective here was to show that parsing schemata are an effective tool 
to get a formal grip on highly complicated algorithms. 

In Chapter 12 we presented Tomita’s Generalized LR parser, with the 
purpose of showing that the notion of a parsing schema can also be applied 
to parsers with an algorithmic structure that is rather different from the 
various types of chart parsers discussed so far. Having clarified the close 
relation between the algorithms of Earley and Tomita, we could cross-fertilize 
the bottom-up parallelization of Earley with the graph-structured stack of 
Tomita. The resulting parallel bottom-up Tomita parser, that was presented 
in Chapter 13, is a successful example of combining properties of different 
algorithms with related underlying parsing schemata. 

In Chapter 14 we have shown that boolean circuits provide a suitable 
abstract machine model for (massively) parallel implementations of parsing 
schemata. As an exemplary non-trivial case, Rytter’s logarithmic- time pars- 
ing algorithm has been treated in detail. 



15.3 Ideas for future research 

The central notion of parsing schemata has been discussed in sufficient detail, 
but there are always some side issue that raise further questions. Some of 
these could become topics of substantial further research. 
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• Efficient parsing of unification grammars is an issue that attracts a lot of 
attention. Unification is an expensive operation, that accounts for most 
of the total processing time of a unification grammar parser. Hence any 
increase in the efficiency of unification speeds up the parser with almost 
the same factor. Much effort has been spent on speeding up unification, 
and reasonably efficient unification algorithms are available now. 

Another way to increase the efficiency of unification grammar parsing is to 
limit the number of unifications that have to be carried out. To this end, 
one can apply the filtering techniques known from context-free parsing. 
There are indications that considerable savings can be obtained by extend- 
ing the context-free backbone of a unification grammar with some key fea- 
tures. The optimal boundary between phrasal and functional constraints, 
as these are called by Maxwell and Kaplan [1993], is a subject that merits 
a more structured investigation. 

• Head- Corner parsing has been described on a rather theoretic level. In 
the area of Natural Language Processing Head-Corner parsers have been 
successfully employed (see, e.g. [van Noord, 1996]). More is to be said about 
under which conditions and for which reasons Head- Corner parsers perform 
well in practical applications. 

• This study did not shed much light on the feasibility of parallel parsing. 
The results of the PBT study in Chapter 13 are somewhat inconclusive. 
The simulation experiment was moderately encouraging, but indeed only a 
simulation. Real parallel parser implementations (e.g. [Thompson, 1994]) 
report moderate, but not really convincing successes. 

The question whether parallel parsing is practically feasible, therefore, is 
still open for debate. 

• Boolean circuits have been introduced as an abstract parallel machine 
model, rather than a serious proposal for parallel implementation. Chapter 
14 presents ideas and examples, rather than a systematic investigation. On 
a theoretic level, the boolean circuit model could serve as a basis for a more 
thorough treatment of the complexity of parsing schemata. 
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pHC(UG) 248-250 
pLC 210, 211, 217-219, 225 
correctness of - 211-216 
pLC’ 218, 219 
PS 55 
QCYK 67 
R2 98 
Rbb 325 
Rytter 97, 98 
sHC 233-236, 243, 244, 252 
complexity of - 236-242 
correctness of - 234-236 
sHC’ 230-236 
sHC” 234-236 
sLC 216-219 
SLR(l) 285-287 
TCYK 56, 91 
UG 161, 162, 164, 192, 250 




